How to write SQL query without join? - sql

Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?

You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T

In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.

Related

How to aggregate different CTEs in outer query SQL

i am trying to join two ctes to get the difference in performance of different countries and group on id here is my example
every campaign can be done in different countries, so how can i group by at the end to have 1 row per campaign id ?
CTE 1: (planned)
select
country
, campaign_id
, sum(sales) as planned_sales
from table x
group by 1,2
CTE 2: (Actual)
select
country
, campaign_id
, sum(sales) as actual_sales
from table y
group by 1,2
outer select
select
country,
planned_sales,
actual_sales
planned - actual as diff
from cte1
join cte2
on campaign_id = campaign_id
This should do it:
select
cte1.campaign_id,
sum(cte1.planned_sales),
sum(cte2.actual_sales)
sum(cte1.planned_sales) - sum(cte2.actual_sales) as diff
from cte1
join cte2
on cte1.campaign_id = cte2.campaign_id and cte1.country = cte2.country
group by 1
I would suggest using full join, so all data is included in both tables, not just data in one or the other. Your query is basically correct but it needs a group by.
select campaign_id,
sum(cte1.planned_sales) as planned_sales
sum(cte2.actual_sales) as actual_sales,
(coalesce(sum(cte1.planned_sales), 0) -
coalesce(sum(cte2.actual_sales), 0)
) as diff
from cte1 full join
cte2
using (campaign_id, country)
group by campaign_id;
That said, there is no reason why the CTEs should aggregate by both campaign and country. They could just aggregate by campaign id -- simplifying the query and improving performance.

SQL Syntax using MAX in nested Query

I’m looking for the syntax to return only products whose latest process date had their transactions status as “Paid”
So something like…
Select Products
From Table 1
Where MAX(Process_date) … *(as I don’t know what to do here)*
AND Transactions IN ‘Paid’
AND product_key = z.product_key
...This THEN will be used as a nested query to attach with another who has Z as its indicator.. a little help?
One method is a correlated subquery:
Select t.*
From Table1 t
where t.process_date = (select max(t2.process_date)
from t t2
where t2.product_key = t.product_key
) and
t.status = 'Paid';
If you just want the product key, then there is a fun method using aggregation:
select product_key
from table1
group by product_key
having max(process_date) = max(case when t.status = 'Paid' then process_date end);
This tests if the largest process_date is the process_date on a paid status.

SQL query to sum a column prior to date and show all entries after that date

I have a table where limits were sanctioned to the customer
I am trying to get the output as below picture i.e. total amount sanctioned till particular date
I am trying below code but this sums the total sanction amount
select gam.id, sum(SANCTION_AMOUNT) from gam
join (select ID,ACCOUNT_OPEN_DATE from gam where ACCOUNT_OPEN_DATE between'01-04-2019' and '30-04-2019' AND SCHEME_CODE IN ('SB','CCKLY')) ) action
on( gam.ACCOUNT_OPEN_DATE <=action.ACCOUNT_OPEN_DATE and gam.id=action.cust_id) group by gam.id;
In Oracle, this can be a way:
select id, sanction_amount, scheme_code, account_open_date,
sum(sanction_amount) over (partition BY ID order by account_open_date) as total_sanction_amount
from gam
order by account_open_date
Not sure your database is MySQL or Oracle, But this below script is workable in most of the database. Just adjust the table and column names accordingly.
You can check MySQL DEMO HERE
SELECT *,
(
SELECT SUM(sanction_Amount)
FROM Your_Table B
WHERE B.ID = A.ID
AND B.acc_open_date <= A.acc_open_date
) Total_sanction_Amount
FROM Your_Table A

What is the most efficient way to find the first and last entry of an entity in SQL?

I was asked this question in an interview. A table, trips, contains the following columns( customer_id, start_from, end_at, start_at_time, end_at_time), with data structured so that each trip is stored as a separate row and a part of the table looks like this: How would you find the list of all the customers who started yesterday from point A and ended yesterday at point P?
I provided solution using windowing functions that identified the list of all customers that started their day at A and then did an inner join of a list of these customers with the customers who ended their day at P( using the same windowing functions).
The solution I gave was this:
SELECT a.customer_id
FROM
(SELECT a.customer_id
FROM
(SELECT customer_id,
start_from,
row_number() OVER (PARTITION BY customer_id
ORDER BY start_at_time ASC) AS rnk
FROM trips
WHERE to_date(start_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.start_from='A' ) AS a
INNER JOIN
(SELECT a.customer_id
FROM
(SELECT customer_id,
end_at,
row_number() OVER (PARTITION BY customer_id
ORDER BY end_at_time DESC) AS rnk
FROM trips
WHERE to_date(end_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.end_at='P' ) AS b ON a.customer_id=b.customer_id
My interviewer said my solution was correct but there is a more efficient way to solve this problem. I've searching and trying to find a more efficient way but I could not find one so far. Can you suggest a more efficient way to solve this problem?
I might use first_value() for this:
select t.customer_id
from (select t.*,
first_value(start_from) over (partition by customer_id order by start_at_time) as first_start,
first_value(end_at) over (partition by customer_id order by start_at_time desc) as last_end
from t
where start_at_time >= date_sub(CURRENT_DATE, 1) and
start_at_time < CURRENT_DATE
) t
where first_start = start_from and -- just some filtering so select distinct is not needed
first_start = 'A' and
last_end = 'P';
I should add that many databases support an equivalent function for aggregation, and I would use that instead.
This assumes that starts are not repeated. To be safe, you can add select distinct, but there is a performance hit for that.
A generalized version of what I would probably have done:
SELECT fandl.a
FROM (
SELECT a, MIN(start) AS t0, MAX(start) AS tN
FROM someTable
WHERE start >= DATE_SUB(CURRENT_DATE, 1) AND start < CURRENT_DATE
GROUP BY a
) AS fandl
INNER JOIN someTable AS st0 ON fandl.a = st0.a AND fandl.t0 = st0.start
INNER JOIN someTable AS stN ON fandl.a = stN.a AND fandl.tN = stN.start
WHERE st0.b1 = 'A' AND stN.b2 = 'P'
;
Using the date function you did, since you did not specify sql dialect.
Note that, in many RDBMS, if there is an (a, start) index, the subquery and joins can be done with the index alone; actual table access would only be required for the final WHERE evaluation.

How to make a SQL query for last transaction of every account?

Say I have a table "transactions" that has columns "acct_id" "trans_date" and "trans_type" and I want to filter this table so that I have just the last transaction for each account. Clearly I could do something like
SELECT acct_id, max(trans_date) as trans_date
FROM transactions GROUP BY acct_id;
but then I lose my trans_type. I could then do a second SQL call with my list of dates and account id's and get my trans_type back but that feels very cludgy since it means either sending data back and forth to the sql server or it means creating a temporary table.
Is there a way to do this with a single query, hopefully a generic method that would work with mysql, postgres, sql-server, and oracle.
This is an example of a greatest-n-per-group query. This question comes up several times per week on StackOverflow. In addition to the subquery solutions given by other folks, here's my preferred solution, which uses no subquery, GROUP BY, or CTE:
SELECT t1.*
FROM transactions t1
LEFT OUTER JOIN transactions t2
ON (t1.acct_id = t2.acct_id AND t1.trans_date < t2.trans_date)
WHERE t2.acct_id IS NULL;
In other words, return a row such that no other row exists with the same acct_id and a greater trans_date.
This solution assumes that trans_date is unique for a given account, otherwise ties may occur and the query will return all tied rows. But this is true for all the solutions given by other folks too.
I prefer this solution because I most often work on MySQL, which doesn't optimize GROUP BY very well. So this outer join solution usually proves to be better for performance.
This works on SQL Server...
SELECT acct_id, trans_date, trans_type
FROM transactions a
WHERE trans_date = (
SELECT MAX( trans_date )
FROM transactions b
WHERE a.acct_id = b.acct_id
)
Try this
WITH
LastTransaction AS
(
SELECT acct_id, max(trans_date) as trans_date
FROM transactions
GROUP BY acct_id
),
AllTransactions AS
(
SELECT acct_id, trans_date, trans_type
FROM transactions
)
SELECT *
FROM AllTransactions
INNER JOIN LastTransaction
ON AllTransactions.acct_id = LastTransaction.acct_id
AND AllTransactions.trans_date = LastTransaction.trans_date
select t.acct_id, t.trans_type, tm.trans_date
from transactions t
inner join (
SELECT acct_id, max(trans_date) as trans_date
FROM transactions
GROUP BY acct_id;
) tm on t.acct_id = tm.acct_id and t.trans_date = tm.trans_date