SQL Query: Looking to calculate difference between two columns both in different tables? - sql

I'm looking to calculate the difference between the sum of two different columns in two different tables. Here's what I have:
SELECT sum(amount)
FROM variable_in
where user_id='111111'
minus
SELECT sum(amount)
FROM variable_out
where user_id='111111'
When I do this, I just get an output of the first query results. How do I have it execute both queries (for the in and out tables) as well as have it minus the variable_out total for the amount column? Since they are both going to be positive integers.
Thanks in advance! Most of the other tips I've seen have been overly complex compared to my issue.

it's very simple...
select
(select sum(amount) from variable_in where user_id='111111')
-
(select sum(amount) from variable_out where user_id='111111')
as amount;

How about moving the queries to the from clause and using -:
SELECT in_amount - out_amount
FROM (SELECT sum(amount) as in_amount
FROM variable_in
WHERE user_id = '111111'
) i CROSS JOIN
(SELECT sum(amount) as out_amount
FROM variable_out
WHERE user_id = '111111'
) o;
Your query is confusing the set operation "minus" with the numerical operator -. Admittedly, they do have the same name. But minus works with sets, not numbers.
I should point out that you can put the nested queries in the FROM clause and use the results like numbers ("scalar subqueries"):
SELECT ((SELECT sum(amount) as in_amount
FROM variable_in
WHERE user_id = '111111'
) -
(SELECT sum(amount) as out_amount
FROM variable_out
WHERE user_id = '111111'
) o
) as diff
FROM dual;

Related

How to write SQL query without join?

Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?
You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T
In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

SQL (BigQuery): How do i use a single value, derived with another query?

This is my query:
WITH last_transaction AS (
SELECT
month
FROM db.transactions
ORDER BY date DESC
LIMIT 1
)
SELECT
*
FROM db.transactions
-- WHERE month = last_transaction.month
WHERE month = 11
GROUP BY
id
Commented out line doesn't work, but intention is clear, i assume: i need to select transactions for the latest month. Business logic might not make sense, because i've extracted it from a bigger query. The main question is: how do i use a single value, derived with another query.
You have only one row, so you can use a scalar subquery:
SELECT t.*
FROM db.transactions t
WHERE month = (SELECT last_transaction.month FROM last_transaction);
I removed the GROUP BY id because it would be a syntax error in BigQuery and it logically does not make sense. Why would a column called id be duplicated in the table?
However, this query would often be written as:
SELECT t.*
FROM (SELECT t.*, MAX(month) OVER () as max_month
FROM db.transactions t
WHERE month = max_month;
Try to JOIN the last_transaction.
A bit like this;
SELECT *
FROM db.transactions
JOIN last_transaction
ON db.transactions.id = last_transaction.id
WHERE month = last_transaction.month
GROUP BY id

SQL query - percentage of sub sample

I got a SQL statement:
Select
ID, GroupID, Profit
From table
I now want to add a fourth column percentage of group profits.
Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID.
Is there a way to do this? The regular sum function does not seem to do the trick.
Thanks
select t1.ID,
t1. GroupID,
(t1.Profit * 1.0) / t2.grp_profit as percentage_profit
from table t1
inner join
(
select GroupID, sum(Profit) as grp_profit
from table
group by GroupID
) t2 on t1.groupid = t2.groupid
One more option with window function
select ID, GroupID, Profit * 1. / SUM(profit) OVER(PARTITION BY GroupID)
from t1
An alternative solution using scalar sub-queries is as follows:
select t1.ID, t1.GroupID, (select sum(t2.Profit) * 1.0 / t1.Profit
from table t2
where t2.GroupID = t1.GroupID) as percentage_profit
from table t1;
To provide an alternate answer, albeit less efficient, is to use a scalar subquery.
SELECT ID, GroupId, Profit, (Profit/(SELECT sum(Profit)
FROM my_table
WHERE GroupId= mt.GroupId))*100 as pct
FROM my_table as mt
From the way it reads I'm not sure if you want "percentage of group profits" or you or want group_profit / individual profit
That's the way this sounds "Therefore the query should sum all the profits for the same group id and then have that number divided by the profit for the unique ID"
Either way just switch the divisor for what you want!
Also if you're using Postgresql >= 8.4 you can use a window function.
SELECT ID, GroupId, Profit, (Profit/ (sum(Profit) OVER(partition by GroupId)))*100 as pct
FROM core_dev.my_table as mt

Compare SQL groups against eachother

How can one filter a grouped resultset for only those groups that meet some criterion compared against the other groups? For example, only those groups that have the maximum number of constituent records?
I had thought that a subquery as follows should do the trick:
SELECT * FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t HAVING Records = MAX(Records);
However the addition of the final HAVING clause results in an empty recordset... what's going on?
In MySQL (Which I assume you are using since you have posted SELECT *, COUNT(*) FROM T GROUP BY X Which would fail in all RDBMS that I know of). You can use:
SELECT T.*
FROM T
INNER JOIN
( SELECT X, COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
) T2
ON T2.X = T.X
This has been tested in MySQL and removes the implicit grouping/aggregation.
If you can use windowed functions and one of TOP/LIMIT with Ties or Common Table expressions it becomes even shorter:
Windowed function + CTE: (MS SQL-Server & PostgreSQL Tested)
WITH CTE AS
( SELECT *, COUNT(*) OVER(PARTITION BY X) AS Records
FROM T
)
SELECT *
FROM CTE
WHERE Records = (SELECT MAX(Records) FROM CTE)
Windowed Function with TOP (MS SQL-Server Tested)
SELECT TOP 1 WITH TIES *
FROM ( SELECT *, COUNT(*) OVER(PARTITION BY X) [Records]
FROM T
)
ORDER BY Records DESC
Lastly, I have never used oracle so apolgies for not adding a solution that works on oracle...
EDIT
My Solution for MySQL did not take into account ties, and my suggestion for a solution to this kind of steps on the toes of what you have said you want to avoid (duplicate subqueries) so I am not sure I can help after all, however just in case it is preferable here is a version that will work as required on your fiddle:
SELECT T.*
FROM T
INNER JOIN
( SELECT X
FROM T
GROUP BY X
HAVING COUNT(*) =
( SELECT COUNT(*) AS Records
FROM T
GROUP BY X
ORDER BY Records DESC
LIMIT 1
)
) T2
ON T2.X = T.X
For the exact question you give, one way to look at it is that you want the group of records where there is no other group that has more records. So if you say
SELECT taxid, COUNT(*) as howMany
GROUP by taxid
You get all counties and their counts
Then you can treat that expressions as a table by making it a subquery, and give it an alias. Below I assign two "copies" of the query the names X and Y and ask for taxids that don't have any more in one table. If there are two with the same number I'd get two or more. Different databases have proprietary syntax, notably TOP and LIMIT, that make this kind of query simpler, easier to understand.
SELECT taxid FROM
(select taxid, count(*) as HowMany from flats
GROUP by taxid) as X
WHERE NOT EXISTS
(
SELECT * from
(
SELECT taxid, count(*) as HowMany FROM
flats
GROUP by taxid
) AS Y
WHERE Y.howmany > X.howmany
)
Try this:
SELECT * FROM (
SELECT *, MAX(Records) as max_records FROM (
SELECT *, COUNT(*) AS Records
FROM T
GROUP BY X
) t
) WHERE Records = max_records
I'm sorry that I can't test the validity of this query right now.