Creating Columns for Summed Answers - sql

So I'm trying to move my row info into the column - I haven't used a Pivot before and tried using it now but clearly I'm doing something wrong -.-;
Here is my original Query
(Select
CASE WHEN ee.ExpenseTypeID=1 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=2 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=3 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=4 THEN Sum(Amount) END as Amount,
et.ExpenseDescription,
ee.UserID
From ExpensesEntries ee, ExpenseTypes et
Where ee.ExpenseTypeID=et.ExpenseTypeID
Group By ee.ExpenseTypeID, et.ExpenseDescription, ee.UserID
Order By UserID) b
Which produces something like this
Amount | ExpenseDescription | UserID
----------------------------------------
156.00 | Upload | 123
----------------------------------------
23.00 | Parking | 123
----------------------------------------
15.37 | Other | 123
----------------------------------------
112.00 | Other | 456
----------------------------------------
28.50 | Parking | 456
----------------------------------------
What I would like to do
UserID | Upload | Parking | Other
----------------------------------
123 | 156.00 | 23.00 | 15.37
----------------------------------
456 | NULL | 28.50 | 112.00
I tried doing this - which is the equivalent of slapping on a Pivot but with the Cases in my original select - I'm not sure if I need to get rid of them entirely and add it into Pivot ?
Select
CASE WHEN ee.ExpenseTypeID=1 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=2 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=3 THEN Sum(Amount)
WHEN ee.ExpenseTypeID=4 THEN Sum(Amount) END as Amount,
et.ExpenseDescription,
ee.UserID
From ExpensesEntries ee, ExpenseTypes et
Where ee.ExpenseTypeID=et.ExpenseTypeID
Group By ee.ExpenseTypeID, et.ExpenseDescription, ee.UserID
Order By UserID
PIVOT
(Amount
FOR et.ExpenseDescription IN ('Upload','Other','Parking')
) as pvt

You didn't provide details on what each of the ExpenseTypeId values relate to but you can do this using an aggregate function with a CASE expression. The CASE expression will check the value of the ExpenseTypeId and get a sum of the amount - you will then name the column with the ExpenseDescription:
select ee.UserId,
sum(case when ee.ExpenseTypeID=1 then Amount else 0 end) Upload,
sum(case when ee.ExpenseTypeID=2 then Amount else 0 end) Parking,
sum(case when ee.ExpenseTypeID=3 then Amount else 0 end) Other
from ExpensesEntries ee
inner join ExpenseTypes et
on ee.ExpenseTypeID=et.ExpenseTypeID
group by ee.UserId;
If you want to use PIVOT, then you would alter your query to something similar to the following. This uses a join between your 2 tables in a subquery and you return the UserId, ExpenseDescription and Amount. The PIVOT will sum the amount for each ExpenseDescription - this process converts the descriptions to columns:
select UserId, Upload, Other, Parking
from
(
select ee.UserId,
et.ExpenseDescription
Amount
from ExpensesEntries ee
inner join ExpenseTypes et
on ee.ExpenseTypeID=et.ExpenseTypeID
) d
pivot
(
sum(Amount)
for ExpenseDescription in (Upload, Other, Parking)
) p;

Related

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.
You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk
You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.
The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle
Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

In Sql, group on Column , but not show in all rows

I am working on a query in database. Say i have a patient which travelled to 3 hospitals. Now i want to add the cost of his journey but want to show it only at his first journey ,the total cost.
file right now is like
Patient Hospital1 cost
A 1 200
A 2 400
A 3 100
B 1 200
I want the output as
Patient Hosptial Cost
A 1 700
A 2
A 3
B 1 200
Thanks
You can use window functions:
select t.*,
(case when row_number() over (partition by patient order by hospital1) = 1
then sum(cost) over (partition by patient)
end) as total_cost
from t
order by patient, hospital1;
CREATE TABLE MyTable(Patient varchar(20),Hospital int, Cost int)
INSERT INTO MyTable(Patient,Hospital,Cost) VALUES ('A',1,200),('A',2,400),
('A',3,100),
('B',1,200)
WITH CTE AS (SELECT Patient,SUM(Cost) AS Cost FROM MyTable
GROUP BY Patient)
SELECT M.Patient,M.Hospital,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY M.Patient ORDER BY M.Hospital)=1 THEN CAST(C.Cost AS VARCHAR(255))
ELSE '' END AS Cost FROM CTE AS C
INNER JOIN Mytable AS M ON C.Patient=M.Patient
Patient | Hospital | Cost
:------ | -------: | :---
A | 1 | 700
A | 2 |
A | 3 |
B | 1 | 200
db<>fiddle here
If you had the records in a table, called PatientVisits, with the following data:
You could use this query:
SELECT PV.Patient, PV.Hospital, PVGroup.TotalCost
FROM PatientVisits PV
LEFT JOIN (
SELECT Patient, MIN(Hospital) as FirstVisit, SUM(Cost) as TotalCost
FROM PatientVisits
GROUP BY Patient) PVGroup ON
PV.Patient = PVGroup.Patient AND
PV.Hospital = PVGroup.FirstVisit
ORDER BY PV.Patient, PV.Hospital
The results would be:

Conditional grouping of records

I have a problem with grouping records in PostgreSQL. I have a structure containing 3 columns, non unique id, name, group (it's old system and I can't change this structure).
Sample records:
id | name | group
-----+----------+------
1 | product1 | 0
1 | product1 | test
2 | product2 | test
3 | product3 | test123
I want the groups unequal 0 to be concatenated (get the id, name of the first record from the group).
The expected result:
id | name | group
-----+----------+------
1 | product1 | 0
1 | product1 | test
3 | product3 | test123
Currently count records in the following way:
SELECT
COUNT(CASE WHEN group = '0' THEN group END) +
COUNT(DISTINCT CASE WHEN group <> '0' THEN group END) AS count
FROM
table
Is it correct way? How can I convert it to retrieve records?
You can use row_number():
select id, name, group
from (select t.*, row_number() over (partition by group order by id) as seqnum
from t
) t
where seqnum = 1 or group = '0';
Note: group is a really bad name for a column. It is a SQL keyword, so you should escape the name. I am leaving it as is because your query uses it.

Get MAX and MIN in a row SQL

;WITH CTE AS
(
SELECT * FROM
(
SELECT CandidateID, t_Candidate.Name, ISNULL(CAST(AVG(Rate) AS DECIMAL(12,2)),0) AS Rate, t_Ambassadors.Name AS CN
FROM t_Vote INNER JOIN t_Candidate
ON t_Vote.CandidateID = t_Candidate.ID
INNER JOIN t_Ambassadors
ON t_Vote.AmbassadorID = t_Ambassadors.ID
GROUP BY Rate, CandidateID, t_Candidate.Name, t_Ambassadors.Name
)MySrc
PIVOT
(
AVG(Rate)
FOR CN IN ([Jean],[Anna],[Felicia])
)AS nSrc
)SELECT CandidateID, Name, CAST([Jean] AS DECIMAL(12,2)) AS AHH ,CAST([Anna] AS DECIMAL(12,2)) AS MK,CAST([Felicia] AS DECIMAL(12,2)) AS DIL, CAST(([Jean] + [Anna] + [Felicia])/3 AS DECIMAL(12,2)) AS Total
FROM CTE
GROUP BY Cte.CandidateID, cte.Name, cte.[Jean], cte.[Anna], cte.[Felicia]
I have solved my previous problem with the above query. I created a new question because I have new problem. How do I get the MAX and MIN rate in a row?
The following is the result I get from the above query:
| CandidateID | Name | AHH | MK | DIL | Total |
|-------------|------|-------|------|------|-------|
| CID1 | Jay | 7.00 | 3.00 | 3.00 | 4.33 |
| CID2 | Mia | 2.00 | 9.00 | 7.00 | 6.00 |
What I want to achieve is this:
| CandidateID | Name | AHH | MK | DIL | Total |
|-------------|------|-------|------|------|-------|
| CID1 | Jay | 7.00 | 3.00 | 3.00 | 3.00 |
| CID2 | Mia | 2.00 | 9.00 | 7.00 | 7.00 |
So what happened on the 2nd result is that, it removed the Highest and Lowest score/rate from the row and Get the average of remaining rate/score. AHH, MK and DIL are not the only Voters, there are 14 of them, I just took the 3 first to make it short and clearer.
I believe you're looking by something like the following (though I'm using case aggregation rather than a pivot).
Essentially, it does the same thing your query does except that it uses a row number to figure out the highest and lowest and exclude them from the final "total" (in the case of a tie, it'll just select one of them, but you can use RANK() instead of row_number() if you don't want to include tied highest/lowest in the average):
WITH CTE AS
(
SELECT CandidateID,
Name,
CN,
Rate,
Lowest = ROW_NUMBER() OVER (PARTITION BY CandidateID, Name ORDER BY Rate),
Highest = ROW_NUMBER() OVER (PARTITION BY CandidateID, Name ORDER BY Rate DESC)
FROM
(
SELECT CandidateID,
t_Candidate.Name,
CN = t_Ambassadors.Name,
Rate = ISNULL(CAST(AVG(Rate) AS DECIMAL(12,2)),0)
FROM t_Vote
JOIN t_Candidate
ON t_Vote.CandidateID = t_Candidate.ID
JOIN t_Ambassadors
ON t_Vote.AmbassadorID = t_Ambassadors.ID
GROUP BY CandidateID, t_Candidate.Name, t_Ambassadors.Name
) AS T
)
SELECT CandidateID,
Name,
AHH = MAX(CASE WHEN CN = 'Jean' THEN Rate END),
MK = MAX(CASE WHEN CN = 'Anna' THEN Rate END),
DIL = MAX(CASE WHEN CN = 'Felicia' THEN Rate END), -- and so on and so forth for each CN
Total = AVG(CASE WHEN Lowest != 1 AND Highest != 1 THEN Rate END)
FROM CTE
GROUP BY CandidateID, Name;
EDIT: It is possible to do this using PIVOT, but unless I'm mistaken, it becomes a matter of working out the average of the ones that aren't highest and lowest before pivoting, which becomes a bit more convoluted. It's all around easier to use case aggregation, IMO.

Select only 1 payment from a table with customers with multiple payments

I have a table called "payments" where I store all the payments of my costumers and I need to do a select to calculate the non-payment rate in a given month.
The costumers can have multiples payments in that month, but I should count him only once: 1 if any of the payments is done and 0 if any of the payment was made.
Example:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-01 | 0 |
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
The result I expect is from the rate of november:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
So the rate will be 50%.
But if the select is:
SELECT * FROM payment WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
It will return me 3 rows and the rate will be 66%, witch is wrong. Ideas?
PS: This is a simpler example of the real table. The real query have a lot of columns, subselects, etc.
It sounds like you need to partition your results per customer.
SELECT TOP 1 WITH TIES
ID,
DATEDUE,
AMOUNT
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC)
WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
PS: The BETWEEN operator is frowned upon by some people. For clarity it might be better to avoid it:
What do BETWEEN and the devil have in common?
Try this
SELECT
id
, SUM(AMOUNT) AS AMOUNT
FROM
Payment
GROUP BY
id;
This might help if you want other columns.
WITH cte (
SELECT
id
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC ) AS RowNum
-- other row
)
SELECT *
FROM
cte
WHERE
RowNum = 1;
To calculate the rate, you can use explicit division:
select 1 - count(distinct case when amount > 0 then id end) / count(*)
from payment
where . . .;
Or, in a way that is perhaps easier to follow:
select avg(flag * 1.0)
from (select id, (case when max(amount) > 0 then 0 else 1 end) as flag
from payment
where . . .
group by id
) i