Is this simple SQL query correct? - sql

The query below is pretty self-explanatory, and although I'm not good at SQL, I can't find anything wrong with it. However, the number it yields in not in accordance with my gut feeling and I would like it double-checked, if this is appropriate for StackOverflow.
I'm simply trying to get the number of users that joined my website in 2020, and also made a payment in 2020. I'm trying to figure out "new revenue".
This is the query:
SELECT Count(DISTINCT( )) AS "2020"
FROM auth_user
JOIN subscription_transaction
ON ( subscription_transaction.event = 'one-time payment'
AND subscription_transaction.user_id =
AND subscription_transaction.timestamp >= '2020-01-01'
AND subscription_transaction.timestamp <= '2020-12-31' )
WHERE auth_user.date_joined >= '2020-01-01'
AND auth_user.date_joined <= '2020-12-31';
I use PostgreSQL 10.
Thanks in advance!

I would write the query using EXISTS to get rid of the COUNT(DISTINCT):
SELECT count(*) AS "2020"
FROM auth_user au
WHERE au.date_joined >= '2020-01-01' AND
au.date_joined < '2021-01-01' AND
FROM subscription_transaction st
WHERE st.event = 'one-time payment' AND
st.user_id = AND
st.timestamp >= '2020-01-01' AND
st.timestamp < '2021-01-01'
) ;
This should be faster than your version. However, the results should be the same.


Adding an aggregate condition to get total count of sub-group

Thanks for the help on this matter, I'm new with SQL. I'm trying to get a sub-count of Jedi who had more than 2 padawans last month. I tried putting the condition in WHERE but I get an error saying I can't include aggregates in it. I also tried using a CASE but kept getting a syntax error there too. Any help on this would be incredible. Thank you so much!
SELECT COUNT(DISTINCT old_republic.jedi_id), old_republic.region_id
FROM jedi_archives.old_repulicdata old_republic
WHERE old_republic.republic_date >= '2022-06-01' AND old_republic.republic_date <= '2022-06-30' AND COUNT(old_republic.padawan)>2
GROUP BY old_republic.region_id
ORDER BY old_republic.region_id
SELECT old_republic.jedi_id CASE (
WHEN Count(old_republic.padawan)>2
ELSE 0 End), old_republic.region_id
FROM jedi_archives.old_repulicdata old_republic
WHERE old_republic.republic_date >= '2022-06-01' AND old_republic.republic_date <= '2022-06-30'
GROUP BY old_republic.region_id
ORDER BY old_republic.region_id
I can't comment to ask for a fiddle, but from what you've written, you're probably looking for the HAVING clause.
Assuming that padawan denotes the number of Padawans:
SELECT region_id, jedi_id, sum(padawan)
FROM jedi_archives.old_republicdata
WHERE republic_date >= '2022-06-01'
AND republic_date <= '2022-06-30'
GROUP BY region_id, jedi_id
HAVING sum(padawan) > 2;
This query will return the sum of Padawans for each Jedi per region who had more than two Padawans last month in one region (if you don't want to take the region into account, remove it from the SELECT and GROUP BY clause). Other Jedis won't appear in the result.
You can use the CASE expression, too, in order to indicate whether a Jedi had more than two padawans:
SELECT region_id, jedi_id,
CASE WHEN sum(padawan) > 2 THEN 1 ELSE 0 END AS more_than_2_padawans
FROM jedi_archives.old_republicdata
WHERE republic_date >= '2022-06-01'
AND republic_date <= '2022-06-30'
GROUP BY region_id, jedi_id;
I'm not entirely sure without sample data. But I think using the HAVING clause could solve your question.
SELECT COUNT(jedi_id) as jedi_id, region_id FROM tableA
WHERE republic_date between '2022-05-20' and '2022-05-25'
GROUP BY region_id
having padawan > 2
db fiddle

Retrieve data if next line of data equals a particular value

I am very new to SQL and I need some assistance with a query.
I am writing a script which is reviewing a log file. Basically the query is retrieving the instance of when a particular status occurred. This is working as expected however I would like to now add a new condition which states that only if the immediate value after this value equals 'Accepted' or 'Attended'. How would I do this. I have pasted the current script below and commented in italics where I think this condition should be. Any help would be greatly appreciated!
Select j.jobcode, min(log.timestamp) as 'Time First Assigned'
from Job J
inner join JobLog Log
on J.JobID = Log.JobID
and log.JobStatusID = 'Assigned' *-- and record after this equals accepted or attended*
where j.CompletionDate >= #Start_date
and j.CompletionDate < #End_date
Group by j.jobcode
I recommend lead(), but using it in a subquery on one table:
with test as (
select j.jobcode, min(log.timestamp) as time_first_assigned
from Job j join
(select jl.*,
lead(jl.JobStatusID) over (partition by jl.jobid order by jl.timestamp) as next_status
from JobLog jl
) jl
on J.JobID = Log.JobID
where jl.JobStatusID = 'Assigned' and
jl.next_JobStatusID in ('accepted', 'attended') and
j.CompletionDate >= #Start_date and
j.CompletionDate < #End_date
group by j.jobcode
In particular, this enables the optimizer to use an index on JobLog(jobid, timestamp, JobStatusId) for the lead(). That said, this will not always improve performance, particularly if the filter on the CompletionDate filters out most rows.
You can use the LEAD windows function as follows:
Select jobcode, min(ts) as 'Time First Assigned' from
(select j.jobcode, log.timestamp as ts, JobStatusID ,
over (partition by Log.JobID order by Log.timestamp) as lead_statusid
from Job J
inner join JobLog Log on J.JobID = Log.JobID
where j.CompletionDate >= #Start_date and j.CompletionDate < #End_date
) t
where JobStatusID = 'Assigned' and lead_statusid in ('accepted', 'attended')
Group by jobcode
Thank you very much.
I used Gordon's suggested code and once I changed the values to the names I used in my code I can confirm that it works.
I did look at the Lead function however I didn't know how to apply it.
Again thanks to everyone for helping with my query.

Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019)

Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019).
Tables given-
Cust_Sales: -Cust_id,product_sku,order_date,order_value,order_id,month
Cust_Territory: cust_id,territory_id,customer_city,customer_pincode
Use tables FCT_CUSTOMER_SALES (which has sales for each Customer) and MAP_CUSTOMER_TERRITORY (which provides Territory-to-Customer mapping) for this question.
Output format-
My solution-
Select (( - * 100) AS SALES_GROWTH , c.territory_id
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/07/2019 and 30/09/2019 group by c.territory_id) as,
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/10/2019 and 31/12/2019 group by c.territory_id) as
Group by c.territory_id
My solution is showing up as incorrect I would request anyone who can help me out with the solution and let me know where my mistake is
One option uses conditional aggregation. The idea is to filter the table on the two quarters at once, then use case expressions within the sum() aggregate function to compute the sales of each of them:
( sum(case when s.order_date >= date '2020-01-01' then s.order_value end)
- sum(case when s.order_date < date '2020-01-01' then s.order_value end)
) / (sum(case when s.order_date < date '2020-01-01' then s.order_value end)) * 100.0 as sales_growth
from fct_customer_sales s
inner join map_customer_territory c on s.customer_id = c.customer_id
where s.order_datetime >= date '2020-01-07' and s.order_datetime < '2020-01-01'
group by c.territory_id
You did not tell which database you are using, while date features are highly vendor-dependent. This uses the standard DATE syntax to declare the literal dates - you might need to adapat that if your database does not support it.

Having difficulty writing sub-query

I am a beginner level with HiveQL, I am trying to write a faster, more efficient query but am having trouble with it. Can someone help me rewrite this query? Any tips you can provide for improving my queries would be appreciated as well.
select "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day),op.order_id,oppty_amount, mv.order_start_date, mv.order_end_date, count(distinct rdz.listener_id) as listeners, sum(impressions) , sum(clicks), (sum(clicks)/sum(impressions)) as ctr, sum(oline_net_amount)
from ROLLUP_PST rdz
join dfp2ss mv on (rdz.order_id = mv.dfp_order_id)
join oppty_order_oline op on (mv.order_id = op.order_id)
where day >= '2015-09-07'
and day <= '2015-09-13'
and creative_size in ('2000x132','134x1285','2000x114')
group by "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day),op.order_id,oppty_amount, mv.order_start_date, mv.order_end_date
order by ctr desc
limit 150;
Please try the below modified query. It will work for you.
select "AUDIOONLYtopctrbyweek37Q32015",week_of_year,order_id,oppty_amount,order_start_date,order_end_date, count(distinct listener_id) over (partition by "AUDIOONLYtopctrbyweek37Q32015",week_of_year,order_id,oppty_amount,order_start_date,order_end_date) from (select "AUDIOONLYtopctrbyweek37Q32015", weekofyear(day) as week_of_year,op.order_id as order_id,
oppty_amount, mv.order_start_date as order_start_date, mv.order_end_date as order_end_date,rdz.listener_id as listener_id
dfp2ss mv,
oppty_order_oline op where rdz.order_id = mv.dfp_order_id and mv.order_id = op.order_id and day >= '2015-09-07' and day <= '2015-09-13'
and creative_size in ('2000x132','134x1285','2000x114')) z

Work Around for SQL Query 'NOT IN' that takes forever?

I am trying to run a query on an Oracle 10g DB to try and view 2 groups of transactions. I want to view basically anyone who has a transaction this year (2014) that also had a transaction in the previous 5 years. I then want to run a query for anyone who has a transaction this year (2014) that hasn't ordered from us in the last 5 years. I assumed I could do this with the 'IN' and 'NOT IN' features. The 'IN' query runs fine but the 'NOT IN' never completes. DB is fairly large which is probably why. Would love any suggestions from the experts!
*Notes, [TEXT] is a description of our Customer's Company name, sometimes the accounting department didn't tie this to our customer ID which left NULL values, so using TEXT as my primary grouping seemed to work although the name is obscure. CODE_D is a product line just to bring context to the name.
Below is my code:
SELECT CODE_D, sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
(select TEXT
from gen_led_voucher_row_tab
and voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
Try using a LEFT JOIN instead of NOT IN:
SELECT t1.CODE_D, sum(coalesce(t1.credit_amount, 0) - coalesce(t1.debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab AS t1
LEFT JOIN gen_led_voucher_row_tab AS t2
ON t1.TEXT = t2.TEXT
AND t2.voucher_date >= '01-JUN-09'
AND t2.voucher_date < '01-JUN-14'
AND (credit_amount > '1' or debet_amount > '1')
ALso, make sure you have an index on the TEXT column.
You can increase your performance by changing the Not In clause to a Where Not Exists like as follows:
Where Not Exists
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text
You'll need to alias the first table as well to a for this to work. Essentially, you're pulling back a ton of data to just discard it. Exists invokes a Semi Join which does not pull back any data at all, so you should see significant improvement.
Your query, as of the current update to the question should be this:
sum(coalesce(credit_amount, 0) - coalesce(debet_amount,0)) as TOTAL
FROM gen_led_voucher_row_tab a
Where ACCOUNTING_YEAR like '2014'
And Not Exists
Select 1
From gen_led_voucher_row_tab b
Where voucher_date >= '01-JUN-09'
and voucher_date < '01-JUN-14'
and (credit_amount > '1' or debet_amount > '1')
And a.Text = b.Text