The data in my table looks like this:
date, app, country, sales
2017-01-01,XYZ,US,10000
2017-01-01,XYZ,GB,2000
2017-01-02,XYZ,US,30000
2017-01-02,XYZ,GB,1000
I need to find, for each app on a daily basis, the ratio of US sales to GB sales, so ideally the result would look like this:
date, app, ratio
2017-01-01,XYZ,10000/2000 = 5
2017-01-02,XYZ,30000/1000 = 30
I'm currently dumping everything into a csv and doing my calculations offline in Python but I wanted to move everything onto the SQL side. One option would be to aggregate each country into a subquery, join and then divide, such as
select d1_us.date, d1_us.app, d1_us.sales / d1_gb.sales from
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'US') as d1_us
join
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'GB') as d1_gb
on d1_us.app = d1_gb.app and d1_us.date = d1_gb.date
Is there a less messy way to go about doing this?
You can use the ratio of SUM(CASE WHEN) and GROUP BY in your query to do this without requiring a subquery.
SELECT DATE,
APP,
SUM(CASE WHEN COUNTRY = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN COUNTRY = 'GB' THEN SALES END) AS RATIO
FROM TABLE1
GROUP BY DATE, APP;
Based on the likelihood of the GB sales being zero, you can tweak the GB's ELSE condition, maybe ELSE 1, to avoid Divide by zero error. It really depends on how you want to handle exceptions.
You can use one query with grouping and provide the condition once:
SELECT date, app,
SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END) AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
However, this gives you zero if there are no records for US and NULL if there are no records for GB. If you need to return different values for those cases, you can use another CASE WHEN surrounding the division. For example, to return -1 and -2 respectively, you can use:
SELECT date, app,
CASE WHEN COUNT(CASE WHEN country = 'US' THEN 1 ELSE 0 END) = 0 THEN -1
WHEN COUNT(CASE WHEN country = 'GB' THEN 1 ELSE 0 END) = 0 THEN -2
ELSE SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END)
END AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
DROP TABLE IF EXISTS t;
CREATE TABLE t (
date DATE,
app VARCHAR(5),
country VARCHAR(5),
sales DECIMAL(10,2)
);
INSERT INTO t VALUES
('2017-01-01','XYZ','US',10000),
('2017-01-01','XYZ','GB',2000),
('2017-01-02','XYZ','US',30000),
('2017-01-02','XYZ','GB',1000);
WITH q AS (
SELECT
date,
app,
country,
SUM(sales) AS sales
FROM t
GROUP BY date, app, country
) SELECT
q1.date,
q1.app,
q1.country || ' vs ' || NVL(q2.country,'-') AS ratio_between,
CASE WHEN q2.sales IS NULL OR q2.sales = 0 THEN 0 ELSE ROUND(q1.sales / q2.sales, 2) END AS ratio
FROM q AS q1
LEFT JOIN q AS q2 ON q2.date = q1.date AND
q2.app = q1.app AND
q2.country != q1.country
-- WHERE q1.country = 'US'
ORDER BY q1.date;
Results for any country vs any country (WHERE q1.country='US' is commented out)
date,app,ratio_between,ratio
2017-01-01,XYZ,GB vs US,0.20
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,GB vs US,0.03
2017-01-02,XYZ,US vs GB,30.00
Results for US vs any other country (WHERE q1.country='US' uncommented)
date,app,ratio_between,ratio
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,US vs GB,30.00
The trick is in JOIN clause.
Results of a subquery q which aggregates data by date, app and country are joined with results themselves but on date and app.
This way, for every date, app and country we get a "match" with any another country on same date and app. By adding q1.country != q2.country, we exclude results for same country, highlighted below with *
date,app,country,sales,date,app,country,sales
*2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,GB,2000.00*
2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,US,10000.00
2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,GB,2000.00
*2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,US,10000.00*
2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,US,30000.00
*2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,GB,1000.00*
*2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,US,30000.00*
2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,GB,1000.00
I am trying to get the query to list the Advisers and provide a count of active students for each. I can get it to list the advisers who have 1 student, exclude those with more than 1, but can not get it to return the advisers with 0 or NULL count.
Select Advisors.AdvisorID, Advisors.FirstName, Advisors.LastName, COUNT(case Students.IsActive WHEN '1' then 1 else NULL end) AS "Number of Students"
FROM Advisors, Students
WHERE Advisors.AdvisorID=Students.AdvisorID
GROUP BY Advisors.AdvisorID, Advisors.FirstName, Advisors.LastName
HAVING COUNT(case Students.IsActive WHEN '1' then 1 else NULL end)='1'
counts the active studnets, and returns the advisor list with the advisor with one student, the advisers with 0 students come back blank. What am I missing?
Select Advisors.AdvisorID, Advisors.FirstName, Advisors.LastName, COUNT(case Students.IsActive WHEN '1' then 1 else NULL end) AS "Number of Students"
FROM Advisors, Students
WHERE Advisors.AdvisorID=Students.AdvisorID
GROUP BY Advisors.AdvisorID, Advisors.FirstName, Advisors.LastName
HAVING COUNT(case Students.IsActive WHEN '1' then 1 else NULL end) IS NULL
comes back with the column names and no data. I have double checked the tables advisor table has 3 entries, one has 2 active students and one inactive 0 or 1 using bit, one has no students, and one has one.
Using <= 1 or < 1 similarly result in blank data.
please use ANSI JOIN syntax
Select Advisors.AdvisorID,
Advisors.FirstName,
Advisors.LastName,
COUNT(case Students.IsActive WHEN '1' then 1 else NULL end) AS "Number of Active Students"
FROM Advisors
LEFT JOIN Students
ON Advisors.AdvisorID=Students.AdvisorID
GROUP BY Advisors.AdvisorID,
Advisors.FirstName,
Advisors.LastName
HAVING COUNT (Students.AdvisorID) = 1
For my example, I have (fake) crime data with three columns: city, number of crimes committed, and time period (containing time periods 1 and 2). I need to create a table with city as one column and crime_reduced as another which is an indicator for whether the crimes committed decreased from time period 1 to period 2.
How may I setup condition to test that crimes_committed in period 2 are less than crimes_committed in period 1? My constraint is that I cannot save a physical copy of a table, so I cannot split my table into one with time period 1 and the other with time period two. I tried the follow code with a case expression, which in retrospect makes no sense.
SELECT city,
CASE WHEN time_period = 1 AND crimes_committed > time_period = 2
AND crimes_committed THEN 1
ELSE 0 END AS crime_reduced
FROM crime_data
GROUP BY city;
Edit: Unfortunately, I couldn't get the case sign expression to work (it might be a platform problem). Though that lead to this question -- is there any way to embed a case expression within a case (this would allow for proper results without creating subqueries)? Something that would look like below (this does not work in Teradata):
SELECT city,
SUM(CASE WHEN
(CASE WHEN time_period = 1 THEN crimes_commited END) > (CASE WHEN time_period = 2
THEN crimes_committed END)
THEN 1 ELSE 0 END) AS crime_reduced
FROM crime_data
GROUP BY city;
You could join two sub queries of the table, each querying a different period:
SELECT t1.city,
CASE WHEN t1.crimes_committed > t2.crimes_committed THEN 'Yes'
ELSE 'No'
END AS crimes_reduced
FROM (SELECT city, crimes_committed
FROM crime_data
WHERE period = 1) t1
JOIN (SELECT city, crimes_committed
FROM crime_data
WHERE period = 2) t2 ON t1.city = t2.city
You need a conditional CASE:
SELECT city,
CASE SIGN(-- data for 1st period
MAX(CASE WHEN time_period = 1 THEN crimes_committed END) -- data for 2nd period
- MAX(CASE WHEN time_period = 2 THEN crimes_committed END))
WHEN 0 THEN 'Same'
WHEN 1 THEN 'Decreased'
WHEN -1 THEN 'Increased'
ELSE 'Unkown (no data)'
END
FROM crime_data
GROUP BY city;
(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date