Group table by custom column - sql

I have a table transaction_transaction with columns:
id, status, total_amount, date_made, transaction_type
The status can be: Active, Paid, Trashed, Renewed, Void
So what i want to do is filter by date and status, but since sometimes there are no records with Renewed or Trashed, i get inconsistent data it returns only Active and Paid when grouping by status ( notice Renewed and Trashed is missing ). I want it allways to return smth like:
-----------------------------------
Active | 121 | 2017-08-09
Paid | 122 | 2017-08-19
Trashed | 123 | 2017-08-20
Renewed | 123 | 2017-08-20
The sql query i use:
SELECT
ST.type,
COALESCE(SUM(TR.total_amount), 0) AS amount
FROM sms_admin_status ST
LEFT JOIN transaction_transaction TR ON TR.status = ST.type
WHERE TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'
GROUP BY ST.type
Edit: I created a table sms_admin_status since you said its bad not having a table and in the future i might have new statuses, and i also changed the query to fit my needs.

Use a VALUES list in a subquery to LEFT JOIN your transaction table. You may need to NULLIF your sums to have them return 0.
https://www.postgresql.org/docs/10/static/queries-values.html

One possible solution (not very nice one) is the following
select statuses.s, date_made, coalesce(SUM(amount), 0)
from (values('active'),('inactive'),('deleted')) statuses(s)
left join transactions t on statuses.s = t.status and
date_made >= '2017-08-08'
group by statuses.s, date_made
I assume that you forgot to add date_made to the group by. therefore, I added it there. As you can see the possible values are hardcoded in the SQL. Some other solution (much more cleaner) is to create a table with possible values of status and replace my statuses.

Use SELECT ... FROM (VALUES) with restriction from the transaction table:
select * from (values('active', 0),('inactive', 0),('deleted', 0)) as statuses
where column1 not in (select status from transactions)
union select status, sum(amount) from transactions group by status
Add the date column as need be, I assume it's a static value

The multiple where statements will limit the rows selected unless they are in a sub-query. May I suggest something like the following?
SELECT ST.type, ISNULL(SELECT SUM(TR.total_amount)
FROM transaction_transaction TR
WHERE TR.status = ST.type AND TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'),0) AS amount
FROM sms_admin_status ST
GROUP BY ST.type

Related

How to join table is sql?

I have two tables which name shoes_type and shoes_list. The shoes_type table includes shoes_id, shoes_size, shoes_type, date, project_id. Meanwhile, on the shoes_list table, I have shoes_quantity, shoes_id, shoes_color, date, project_id.
I need to get the sum of shoes_quantity based on the shoes_type, shoes_size, date, and also project_id.
I get how to sum the shoes_quantity based on color by doing:
select shoes_color, sum(shoes_quantity)
from shoes_list group by shoes_color
Basically what I want to see is the total quantity of shoes based on the type, size, date and project_id. The type and size information are available on shoes_type table, while the quantity is coming from the shoes_list table. I expect to see something like:
shoes_type shoes_size total quantity date project_id
heels 5 3 19/10/02 1
sneakers 5 3 19/10/02 1
sneakers 6 1 19/10/05 1
heels 7 5 19/10/03 1
While for the desired result, I have tried:
select shoes_type, shoes_size, date, project_id, sum(shoes_quantity)
from shoes_type st
join shoes_list sl
on st.project_id = sl.project_id
and st.shoes_id = sl.shoes_id
and st.date = sl.date
group by shoes_type, shoes_size, date, project_id
Unfortunately, I got an error that says that the column reference "date" is ambiguous.
How should I fix this?
Thank you.
The date column exists in both tables, so you have to specify where to select it from. Replace date with shoes_type.date or shoes_list.date
Qualify all column references to remove the "ambiguous" column error:
select st.shoes_type, st.shoes_size, st.date, st.project_id, sum(slshoes_quantity)
from shoes_type st join
shoes_list sl
on st.project_id = sl.project_id and
st.shoes_id = sl.shoes_id and
st.date = sl.date
group by st.shoes_type, st.shoes_size, st.date, st.project_id;
If you want all columns from shoes_type, you might find that a correlated subquery is faster:
select st.*,
(select sum(slshoes_quantity)
from shoes_list sl
where st.project_id = sl.project_id and
st.shoes_id = sl.shoes_id and
st.date = sl.date
)
from shoes_type st;

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

SQL Server select max date per ID

I am trying to select max date record for each service_user_id for each finance_charge_id and the amount that is linked the highest date
select distinct
s.Finance_Charge_ID, MAX(s.start_date), s.Amount
from
Service_User_Finance_Charges s
where
s.Service_User_ID = '156'
group by
s.Finance_Charge_ID, s.Amount
The issue is that I receive multiple entries where the amount is different. I only want to receive the amount on the latest date for each finance_charge_id
At the moment I receive the below which is incorrect (the third line should not appear as the 1st line has a higher date)
Finance_Charge_ID (No column name) Amount
2 2014-10-19 1.00
3 2014-10-16 500.00
2 2014-10-01 1000.00
Remove the Amount column from the group by to get the correct rows. You can then join that query onto the table again to get all the data you need. Here is an example using a CTE to get the max dates:
WITH MaxDates_CTE (Finance_Charge_ID, MaxDate) AS
(
select s.Finance_Charge_ID,
MAX(s.start_date) MaxDate
from Service_User_Finance_Charges s
where s.Service_User_ID = '156'
group by s.Finance_Charge_ID
)
SELECT *
FROM Service_User_Finance_Charges
JOIN MaxDates_CTE
ON MaxDates_CTE.Finance_Charge_ID = Service_User_Finance_Charges.Finance_Charge_ID
AND MaxDates_CTE.MaxDate = Service_User_Finance_Charges.start_date
This can be done using a window function which removes the need for a self join on the grouped data:
select Finance_Charge_ID,
start_date,
amount
from (
select s.Finance_Charge_ID,
s.start_date,
max(s.start_date) over (partition by s.Finance_Charge_ID) as max_date,
s.Amount
from Service_User_Finance_Charges s
where s.Service_User_ID = 156
) t
where start_date = max_date;
As the window function does not require you to use group by you can add any additional column you need in the output.

How to Return a Query Result Even When Nothing Comes Up for the Results

I have a table called "orders" with the following data:
Date Order_No Ship_Method
-------------------------------------
12/6/2013 1234567 RTS
12/6/2013 7654321
12/7/2013 3456789 RTS
12/7/2013 9876543
12/7/2013 1123456 RTS
12/7/2013 5523847 RTS
12/8/2013 8876549
12/8/2013 7733654
I need a query that will search for how many rows of data have "RTS" in the Ship_Method field and return the following result for the day I run the query (like below):
Date RTS_Shipments
-------------------------
12/8/2013 0
-Anthony C.
EDIT
The query that works for me concerning the above issue is the following (provded by Carth in the answers section):
select
sub.tdate,
count(orders.dt)
from (select to_date(to_char((sysdate-1),'mmddYYYY'),'mmddYYYY') tdate from dual) sub
left join orders on sub.tdate = orders.dt and orders.ship_method like '%RTS%'
group by sub.tdate
In order to get the result you need here you will be required to create a basis for your select in which the term you're filtering on definitively exists and then use that to join to your target table. In my initial answer I was thinking that you could select a variable value directly from dual but since I can't get the syntax of that to work out correctly here are a couple of other ideas. You could create a new "working" table that has all the dates you want to check and then select against that and left join to your target table, or you could create a subselect to hold your intended value and use that as the basis of your statement like this:
select
sub.tdate,
count(orders.dt)
from (select to_date(to_char((sysdate-1),'mmddYYYY'),'mmddYYYY') tdate from dual) sub
left join orders on sub.tdate = orders.dt and orders.ship_method like '%RTS%'
group by sub.tdate
You can use a left join on the same table to get all results and the results that you need. The one that you need you sum 1 and 0 for the one you don't.
That should do.
SELECT to_char(o1.DT, 'MM/DD/YYYY') As "Date",
SUM(case when o1.Ship_Method is null then 0 else 1 end) As RTS_Shipments
FROM orders o1
left join orders o2
on (o1.dt = o2.dt and o2.Ship_Method like '%RTS%')
WHERE trunc(o1.DT) = trunc(SYSDATE)
GROUP BY to_char(o1.DT, 'MM/DD/YYYY')
See it at fiddle: http://sqlfiddle.com/#!4/fd374/6
I've changed the date field to dt because date is a reserved word.
And also. This query can count every date but it will show just the registries for the day you filter: trunc(o1.DT) = trunc(SYSDATE) if you take out this condition you will see every date with the proper count.

Optimize SQL subquery containing multiple inner joins and aggregate functions

I have a select statement which is infact a subquery within a larger select statement built up programmatically. The problem is if I elect to include this subquery it acts as a bottle neck and the whole query becomes painfully slow.
An example of the data is as follows:
Payment
.Receipt_no|.Person |.Payment_date|.Type|.Reversed|
2|John |01/02/2001 |PA | |
1|John |01/02/2001 |GX | |
3|David |15/04/2003 |PA | |
6|Mike |26/07/2002 |PA |R |
5|John |01/01/2001 |PA | |
4|Mike |13/05/2000 |GX | |
8|Mike |27/11/2004 |PA | |
7|David |05/12/2003 |PA |R |
9|David |15/04/2003 |PA | |
The subquery is as follows :
select Payment.Person,
Payment.amount
from Payment
inner join (Select min([min_Receipt].Person) 'Person',
min([min_Receipt].Receipt_no) 'Receipt_no'
from Payment [min_Receipt]
inner join (select min(Person) 'Person',
min(Payment_date) 'Payment_date'
from Payment
where Payment.reversed != 'R' and Payment.Type != 'GX'
group by Payment.Person) [min_date]
on [min_date].Person= [min_Receipt].Person and [min_date].Payment_date = [min_Receipt].Payment_date
where [min_Receipt].reversed != 'R' and [min_Receipt].Type != 'GX'
group by [min_Receipt].Person) [1stPayment]
on [1stPayment].Receipt_no = Payment.Receipt_no
This retrieves the first payment of each person by .Payment_date (ascending), .Receipt_no (ascending) where .type is not 'GX' and .Reversed is not 'R'. As Follows:
Payment
.Receipt_No|.Person|.Payment_date
5|John |01/01/2001
3|David |15/04/2003
8|Mike |27/11/2004
Following Ahmads post -
From the following results
(3|David |15/04/2003)
and (9|David |15/04/2003)
I would only want the record with the lowest receipt_no. So
(3|David |15/04/2003)
So I added the aggregate function 'min(Payment.receipt_no)' grouping by person.
Query 1.
select min(Payment.Person) 'Person',
min(Payment.receipt_no) 'receipt_no'
from
Payment a
where
a.type<>'GX' and (a.reversed not in ('R') or a.reversed is null)
and a.payment_date =
(select min(payment_date) from Payment i
where i.Person=a.Person and i.type <> 'GX'
and (i.reversed not in ('R') or i.reversed is null))
group by a.Person
I added this as a subquery within my much larger query, however it still ran very slowly. So I tried rewriting the query whilst trying to avoid the use of aggregate functions and came up with the following.
Query 2.
SELECT
receipt_no,
person,
payment_date,
amount
FROM
payment a
WHERE
receipt_no IN
(SELECT
top 1 i.receipt_no
FROM
payment i
WHERE
(i.reversed NOT IN ('R') OR i.reversed IS NULL)
AND i.type<>'GX'
AND i.person = a.person
ORDER BY i.payment_date DESC, i.receipt_no ASC)
Which I wouldn't necessarily think as more efficient. In fact if I run the two queries side by side on my larger data set Query 1. completes in a matter of milliseconds where as Query 2. takes several seconds.
However if I then add them as subqueries within a much larger query, the larger query completes in hours using Query 1. and completes in 40 seconds using Query 2.
I can only attribute this to the use of aggregate functions in one and not the other.
How do you distinguish the payments
(3|David |15/04/2003)
and (9|David |15/04/2003)
These are both done by the same. Unless the time is different, then this query should work fine:
select
receipt_no,
person,
payment_date
from
payment a
where
type<>'GX' and (reversed not in ('R') or reversed is null)
and payment_date =
(select min(payment_date) from payment i
where i.person=a.person and i.type <> 'GX'
and (i.reversed not in ('R') or i.reversed is null))
order by person,payment_date desc
I have set up and tested this query on SQLFiddle, but I am not sure about the performance, since I don't have the amount of data that you have. So check and let me know
===
SQL Fiddle Demo for the Question above
Following a comment from CodeReview -
I've also re-written the query using the Rank() command as suggested.
Query 3.
left join
(select
a.Person,
a.amount,
(rank () over (Partition by a.Person order by a.payment_date desc, a.receipt_no desc)) 'Ranked'
from
Payment a
Where
(a.reversed not in ('R') or a.reversed is null)
and a.type != 'GX'
) [lastPayment]
on
[lastPayment].Person = [Person].Person
and [lastPayment].ranked = 1
This method has also resulted in the speeding up the larger query, the larger query now taking some 28 seconds
However Rank() is only supprted from SQL 2005 upwards.