How to combine two results from same table - sql

I'm looking for a way of pulling totals from a transactions table. Sales and return transactions are differentiated by a column, but the value is always stored as a positive.
I've managed to pull the different transaction type totals, grouped by product, as separate rows:
SELECT `type`,
`product`,
sum(`final_price`) AS `value`,
count(`final_price`) AS `count`
GROUP BY `product`, `type`
The result is:
Type | Product | Value | Count
S | 1 | 1000 | 2
S | 4 | 750 | 3
S | 2 | 300 | 2
S | 3 | 10 | 1
R | 1 | 500 | 1
Ideally, I'd like to have the totals displayed on a single row but in additional columns different columns for ordering purposes. The ideal result would be:
Type | Product | s_value | s_count | r_value | r_count
S | 1 | 1000 | 2 | 500 | 1
S | 4 | 750 | 3 | 0 | 0
S | 2 | 300 | 2 | 0 | 0
S | 3 | 10 | 1 | 0 | 0
I've tried union all and left joins with no joys so far.

You can use case expressions to differentiate by the type of transaction:
SELECT `product`,
SUM(CASE `type` WHEN 'S' THEN `final_price` END) AS `s_value`,
COUNT(CASE `type` WHEN 'S' THEN `final_price` END) AS `s_count`,
SUM(CASE `type` WHEN 'R' THEN `final_price` END) AS `r_value`,
COUNT(CASE `type` WHEN 'R' THEN `final_price` END) AS `r_count`
GROUP BY `product`, `type`
EDIT:
By the forward-quotes around the column names I'm assuming this is a MySQL questions even though it's not explicitly tagged as such.
If this is the case, you can simplify the count statements by utilizing MySQL's automatic conversion from Boolean to int which takes true as a 1 and false as a 0:
SELECT `product`,
SUM(CASE `type` WHEN 'S' THEN `final_price` END) AS `s_value`,
SUM(`type` = 'S') AS `s_count`,
SUM(CASE `type` WHEN 'R' THEN `final_price` END) AS `r_value`,
SUM(`type` = 'R') AS `r_count`
GROUP BY `product`, `type`

Related

Optimise SQL Query with SUM and Case

I have the following query which takes more than 1 mn to return data:
SELECT extract(HOUR
FROM date) AS HOUR,
SUM(CASE
WHEN country_name = France THEN atdelay
ELSE 0
END) AS France,
SUM(CASE
WHEN country_name = USA THEN atdelay
ELSE 0
END) AS USA,
SUM(CASE
WHEN country_name = China THEN atdelay
ELSE 0
END) AS China,
SUM(CASE
WHEN country_name = Brezil THEN atdelay
ELSE 0
END) AS Brazil,
SUM(CASE
WHEN country_name = Argentine THEN atdelay
ELSE 0
END) AS Argentine,
SUM(CASE
WHEN country_name = Equator THEN atdelay
ELSE 0
END) AS Equator,
SUM(CASE
WHEN country_name = Maroc THEN atdelay
ELSE 0
END) AS Maroc,
SUM(CASE
WHEN country_name = Egypt THEN atdelay
ELSE 0
END) AS Egypt
FROM
(SELECT *
FROM Contry
WHERE (TO_CHAR(entrydate, 'YYYY-MM-DD')::DATE) >= '2021-01-01'
AND (TO_CHAR(entrydate, 'YYYY-MM-DD')::DATE) <= '2021-01-31'
AND code IS NOT NULL) AS A
GROUP BY HOUR
ORDER BY HOUR ASC;
My table is structured like so:
+---------------------+---------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+-------------------+-----------------------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| country_name | varchar(30) | YES | MUL | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| entrydate | timestamp | NO | | NULL | |
| keyword_count | int(11) | YES | | NULL | |
| all_impressions | int(11) | YES | | NULL | |
| all_clicks | int(11) | YES | | NULL | |
| all_ctr | float | YES | | NULL | |
| all_positions | float | YES | | NULL | |
+---------------------+---------------+------+-----+-------------------+-----------------------------+
The current table size is closing in on 50 million rows.
How can I make this faster?
I'm hoping there is another query or table optimisation I can do - alternatively I could pre-aggregate the data but I'd rather avoid that.
(Your table definition doesn't look like you are really using Postgres, but as you tagged your question with Postgres I'll answer it nevertheless)
One obvious attempt would be to create an index on entrydate, then change your WHERE clause so it can make use of that. When it comes to timestamp columns and a range condition it's usually better to use the "next day" as the upper limit together with < instead of <=
WHERE entrydate >= date '2021-01-01'
AND entrydate < date '2021-02-01'
AND code IS NOT NULL
If the condition AND code IS NOT NULL removes many rows in addition to the date range, you can created a partial index.
create index on country (entrydate)
where code IS NOT NULL;
However, when a large part of the rows qualifies for code is not null the additional filter won't help very much.
Not performance related, but the conditional aggregation can be written in a bit more compact way using the filter clause:
sum(atdelay) filter (where country_name = 'France') as france

Transposing lines containing Text to columns

I have a table just like this one:
+----+---------+-------------+------------+
| ID | Period | Total Units | Source |
+----+---------+-------------+------------+
| 1 | Past | 400 | Competitor |
| 1 | Present | 250 | PAWS |
| 2 | Past | 3 | BP |
| 2 | Present | 15 | BP |
+----+---------+-------------+------------+
And I'm trying to transpose the lines into columns, so that for each ID, I have one unique line that compares past and present numbers and attributes. Like following :
+----+------------------+---------------------+-------------+----------------+
| ID | Total Units Past | Total Units Present | Source Past | Source Present |
+----+------------------+---------------------+-------------+----------------+
| 1 | 400 | 250 | Competitor | PAWS
|
| 2 | 3 | 15 | BP | BP |
+----+------------------+---------------------+-------------+----------------+
Transposing the total units is not a problem, as I use a SUM(CASE WHEN Period = Past THEN Total_Units ELSE 0 END) AS Total_Units.
However I don't know how to proceed with text columns. I've seen some pivot and unpivot clause used but they all use an aggregate function at some point.
You can do conditional aggregation :
select id,
sum(case when period = 'past' then units else 0 end) as unitspast,
sum(case when period = 'present' then units else 0 end) as unitpresent,
max(case when period = 'past' then source end) as sourcepast,
max(case when period = 'present' then source end) as sourcepresent
from table t
group by id;
Assuming you only have two rows per ID, you could also join:
Select a.ID, a.units as UnitsPast, a.source as SourcePast
, b.units as UnitsPresent, b.source as SourcePresent
from MyTable a
left join MyTable b
on a.ID = b.ID
and b.period = 'Present'
where a.period = 'Past'

One SQL query with multiple conditions

I am running an Oracle database and have two tables below.
#account
+----------------------------------+
| acc_id | date | acc_type |
+--------+------------+------------+
| 1 | 11-07-2018 | customer |
| 2 | 01-11-2018 | customer |
| 3 | 02-09-2018 | employee |
| 4 | 01-09-2018 | customer |
+--------+------------+------------+
#credit_request
+-----------------------------------------------------------------+
| credit_id | date | credit_type | acc_id | credit_amount |
+------------+-------------+---------- +--------+
| 1112 | 01-08-2018 | failed | 1 | 2200 |
| 1214 | 02-12-2018 | success | 2 | 1500 |
| 1312 | 03-11-2018 | success | 4 | 8750 |
| 1468 | 01-12-2018 | failed | 2 | 3500 |
+------------+-------------+-------------+--------+---------------+
Want to have followings for each customer:
the last successful credit_request
sum of credit_amount of all failed credit_requests
Here is one method:
select a.acct_id, acr.num_fails,
acr.num_successes / nullif(acr.num_fails) as ratio, -- seems weird. Why not just the failure rate?
last_cr.credit_id, last_cr.date, last_cr.credit_amount
from account a left join
(select acc_id,
sum(case when credit_type = 'failed' then 1 else 0 end) as num_fails,
sum(case when credit_type = 'failed' then credit_amount else 0 end) as num_fails,
sum(case when credit_type = 'success' then 1 else 0 end) as num_successes
max(case when credit_type = 'success' then date else 0 end) as max_success_date
from credit_request
group by acct_id
) acr left join
credit_request last_cr
on last_cr.acct_id = acr.acct_id and last_cr.date = acr.date;
The following query should do the trick.
SELECT
acc_id,
MAX(CASE WHEN credit_type = 'success' AND rn = 1 THEN credit_id END) as last_successfull_credit_id,
MAX(CASE WHEN credit_type = 'success' AND rn = 1 THEN cdate END) as last_successfull_credit_date,
MAX(CASE WHEN credit_type = 'success' AND rn = 1 THEN credit_amount END) as last_successfull_credit_amount,
SUM(CASE WHEN credit_type = 'failed' THEN credit_amount ELSE 0 END) total_amount_of_failed_credit,
SUM(CASE WHEN credit_type = 'failed' THEN 1 ELSE 0 END) / COUNT(*) ratio_success_request
FROM (
SELECT
a.acc_id,
a.cdate adate,
a.acc_type,
c.credit_id,
c.cdate,
c.credit_type,
c.credit_amount,
ROW_NUMBER() OVER(PARTITION BY c.acc_id, c.credit_type ORDER BY c.cdate DESC) rn
FROM
account a
LEFT JOIN credit_request c ON c.acc_id = a.acc_id
) x
GROUP BY acc_id
ORDER BY acc_id
The subquery assigns a sequence to each record, within groups of accounts and credit types, using ROW_NUMBR(). The outer query does conditional aggrgation to compute the different computation you asked for.
This Db Fiddle demo with your test data returns :
ACC_ID | LAST_SUCCESSFULL_CREDIT_ID | LAST_SUCCESSFULL_CREDIT_DATE | LAST_SUCCESSFULL_CREDIT_AMOUNT | TOTAL_AMOUNT_OF_FAILED_CREDIT | RATIO_SUCCESS_REQUEST
-----: | -------------------------: | :--------------------------- | -----------------------------: | ----------------------------: | --------------------:
1 | null | null | null | 2200 | 1
2 | 1214 | 02-DEC-18 | 1500 | 3500 | .5
3 | null | null | null | 0 | 0
4 | 1312 | 03-NOV-18 | 8750 | 0 | 0
This might be what you are looking for... Since you did not show expected results, this might not be 100% accurate, feel free to adapt this.
I guess the below query is easy to understand and implement. Also, to avoid more and more terms in the CASE statements you can just make use of WITH clause and use it in the CASE statements to reduce the query size.
SELECT a.acc_id,
c.credit_type,
(distinct c.credit_id),
CASE WHEN
c.credit_type='success'
THEN max(date)
END CASE,
CASE WHEN
c.credit_type='failure'
THEN sum(credit_amount)
END CASE,
(CASE WHEN
c.credit_type='success'
THEN count(*)
END CASE )/
( CASE WHEN
c.credit_type='failure'
THEN count(*)
END CASE)
from accounts a LEFT JOIN
credit_request c on
a.acc_id=c.acc_id
where a.acc_type= 'customer'
group by c.credit_type

Aggregation for multiple SQL SELECT statements

I've got a table TABLE1 like this:
|--------------|--------------|--------------|
| POS | TYPE | VOLUME |
|--------------|--------------|--------------|
| 1 | A | 34 |
| 2 | A | 2 |
| 1 | A | 12 |
| 3 | B | 200 |
| 4 | C | 1 |
|--------------|--------------|--------------|
I want to get something like this (TABLE2):
|--------------|--------------|--------------|--------------|--------------|
| POS | Amount_A | Amount_B | Amount_C | Sum_Volume |
|--------------|--------------|--------------|--------------|--------------|
| 1 | 2 | 0 | 0 | 46 |
| 2 | 1 | 0 | 0 | 2 |
| 3 | 0 | 1 | 0 | 200 |
| 4 | 0 | 0 | 1 | 1 |
|--------------|--------------|--------------|--------------|--------------|
My Code so far is:
SELECT
(SELECT COUNT(TYPE)
FROM TABLE1
WHERE TYPE = 'A') AS [Amount_A]
,(SELECT COUNT(TYPE)
FROM TABLE1
WHERE TYPE = 'B') AS [Amount_B]
,(SELECT COUNT(TYPE)
FROM TABLE1
WHERE TYPE = 'C') AS [Amount_C]
,(SELECT SUM(VOLUME)
FROM TABLE AS [Sum_Volume]
INTO [TABLE2]
Now two Questions:
How can I include the distinction concerning POS?
Is there any better way to count each TYPE?
I am using MSSQLServer.
What you're looking for is to use GROUP BY, along with your Aggregate functions. So, this results in:
USE Sandbox;
GO
CREATE TABLE Table1 (Pos tinyint, [Type] char(1), Volume smallint);
INSERT INTO Table1
VALUES (1,'A',34 ),
(2,'A',2 ),
(1,'A',12 ),
(3,'B',200),
(4,'C',1 );
GO
SELECT Pos,
COUNT(CASE WHEN [Type] = 'A' THEN [Type] END) AS Amount_A,
COUNT(CASE WHEN [Type] = 'B' THEN [Type] END) AS Amount_B,
COUNT(CASE WHEN [Type] = 'C' THEN [Type] END) AS Amount_C,
SUM(Volume) As Sum_Volume
FROM Table1 T1
GROUP BY Pos;
DROP TABLE Table1;
GO
if you have a variable, and undefined, number of values for [Type], then you're most likely going to need to use Dynamic SQL.
your first column should be POS, and you'll GROUP BY POS.
This will give you one row for each POS value, and aggregate (COUNT and SUM) accordingly.
You can also use CASE statements instead of subselects. For instance, instead of:
(SELECT COUNT(TYPE)
FROM TABLE1
WHERE TYPE = 'A') AS [Amount_A]
use:
COUNT(CASE WHEN TYPE = 'A' then 1 else NULL END) AS [Amount_A]

SQL query for retrieving different data series contained in the same table

I'm trying to retrieve data series contained in a table that basically looks like this:
row | timestamp | seriesId | int32 | int64 | double
---------------------------------------------------
0 | 0 | 0 | 2 | |
1 | 1 | 0 | 4 | |
2 | 1 | 1 | 435 | |
3 | 1 | 2 | | 2345 |
4 | 1 | 3 | | | 0.5
5 | 2 | 0 | 5 | |
6 | 2 | 1 | 453 | |
7 | 2 | 2 | | 2401 |
....
I would like to get a result set that looks like this (so that I can easily plot it):
row | timestamp | series0 | series1 | series 2 | ...
----------------------------------------------------
0 | 0 | 2 | | |
1 | 1 | 4 | 435 | 2345 |
2 | 2 | 5 | 453 | 2401 |
...
My SQL skillz are unfortunately not really what they should be, so my first attempt at achieving this feels a bit awkward:
SELECT tbl0.timestamp, tbl0.int32 as series0,
tbl1.int32 as series1
FROM
(SELECT * FROM StreamData WHERE seriesId=0) as tbl0
INNER JOIN
(SELECT * FROM StreamData WHERE seriesId=1) as tbl1
ON tbl0.timestamp = tbl1.timestamp
ORDER BY tbl0.timestamp;
This doesn't really seem to be the right way of trying to accomplish this, especially not when the number of different series goes up. I can change the way data gets stored in the table (it's in an SQLite database if that matters) if that would make things easier, but as the number of different series may be different from time to time, I would prefer having them all in the same table.
Is there a better way to write the above query?
It seems you have to use "group by":
SELECT row, timestamp, count(seriedIS) AS series0, sum(int32) AS series1, sum(int64) AS series2
FROM StreamData
WHERE (streamId=0) OR (streamId=1)
GROUP BY (timestamp)
ORDER BY timestamp;
Just try!
It will only work if you know how many series you have stored in there. So compacting INT32, INT64 and DOUBLE down will work fine. But as you can have any number of SeriesID's, there's a problem there.
Here's how to compact the nullable columns (ignoring the existance of SeriesID).
SELECT
timestamp,
MAX(int32) AS series0,
MAX(int64) AS series1,
MAX(double) AS series2
FROM
StreamData
GROUP BY
timestamp
If you know the exact number of series, you could modify it as follows...
SELECT
timestamp,
MAX(CASE WHEN seriesID = 0 THEN int32 ELSE NULL END) AS series0,
MAX(CASE WHEN seriesID = 1 THEN int64 ELSE NULL END) AS series1,
MAX(CASE WHEN seriesID = 2 THEN double ELSE NULL END) AS series2,
MAX(CASE WHEN seriesID = 3 THEN int32 ELSE NULL END) AS series3,
MAX(CASE WHEN seriesID = 4 THEN int64 ELSE NULL END) AS series4,
MAX(CASE WHEN seriesID = 5 THEN double ELSE NULL END) AS series5
FROM
StreamData
GROUP BY
timestamp
But if you want the SQL to work all of this out itself, for any number of series. You'd have to write code that writes the SQL you need.
If you have a potentially variable number of seriesId, you will need to assemble the SQL Query dynamically. It would have to look like this:
select
TimeStamp,
Max(case seriesId when 0 then coalesce(int32, int64) else null end) series0,
Max(case seriesId when 1 then coalesce(int32, int64) else null end) series1,
Max(case seriesId when 2 then coalesce(int32, int64) else null end) series2,
Max(case seriesId when 3 then coalesce(int32, int64) else null end) series3,
Max(case seriesId when 4 then coalesce(int32, int64) else null end) series4,
Max(case seriesId when 5 then coalesce(int32, int64) else null end) series5,
Max(case seriesId when 6 then coalesce(int32, int64) else null end) series6
from StreamData
group by TimeStamp
order by TimeStamp
Also from your data sample, I understood that you get either int32 or int64, depending on int32 nullity, thus, the coalesce.