SQL - divide one column by another - sql

I have the following code:
select c.category
,sum(b.is_open) as open
,count(b.name) as total
from business b inner join category c on b.id=c.business_id
group by c.category
order by sum(b.is_open) desc
limit 10
which gives me following dataset:
+------------------+------+-------+
| category | open | total |
+------------------+------+-------+
| Restaurants | 53 | 71 |
| Shopping | 25 | 30 |
| Food | 20 | 23 |
| Health & Medical | 16 | 17 |
| Home Services | 15 | 16 |
| Beauty & Spas | 12 | 13 |
| Nightlife | 12 | 20 |
| Bars | 11 | 17 |
| Active Life | 10 | 10 |
| Local Services | 10 | 12 |
+------------------+------+-------+
However, if I change line 2 and 3 to:
sum(b.is_open) / count(b.name) as '%'
instead of a specific value, I get zeroes all along. I tried to cast both columns to decimal type (although looks like they have been such in the beginning), did not work. Why can't I get the right results? I am writing my queries in SQLite.

Try using floating point arithmetic instead of integer arithmetic:
1.0 * sum(b.is_open) / count(b.name) as '%'

Related

SQL to Get Latest Field Value

I'm trying to write an SQL query (SQL Server) that returns the latest value of a field from a history table.
The table structure is basically as below:
ISSUE TABLE:
issueid
10
20
30
CHANGEGROUP TABLE:
changegroupid | issueid | updated |
1 | 10 | 01/01/2020 |
2 | 10 | 02/01/2020 |
3 | 10 | 03/01/2020 |
4 | 20 | 05/01/2020 |
5 | 20 | 06/01/2020 |
6 | 20 | 07/01/2020 |
7 | 30 | 04/01/2020 |
8 | 30 | 05/01/2020 |
9 | 30 | 06/01/2020 |
CHANGEITEM TABLE:
changegroupid | field | newvalue |
1 | ONE | 1 |
1 | TWO | A |
1 | THREE | Z |
2 | ONE | J |
2 | ONE | K |
2 | ONE | L |
3 | THREE | K |
3 | ONE | 2 |
3 | ONE | 1 | <--
4 | ONE | 1A |
5 | ONE | 1B |
6 | ONE | 1C | <--
7 | ONE | 1D |
8 | ONE | 1E |
9 | ONE | 1F | <--
EXPECTED RESULT:
issueid | updated | newvalue
10 | 03/01/2020 | 1
20 | 07/01/2020 | 1C
30 | 06/01/2020 | 1F
So each change to an issue item creates 1 change group record with the date the change was made, which can then contain 1 or more change item records.
Each change item shows the field name that was changed and the new value.
I then need to link those tables together to get each issue, the latest value of the field name called 'ONE', and ideally the date of the latest change.
These tables are from Jira, for those familiar with that table structure.
I've been trying to get this to work for a while now, so far I've got this query:
SELECT issuenum, MIN(created) AS updated FROM
(
SELECT ISSUE.IssueId, UpdGrp.Created as Created, UpdItm.NEWVALUE
FROM ISSUE
JOIN ChangeGroup UpdGrp ON (UpdGrp.IssueID = CR.ID)
JOIN CHANGEITEM UpdItm ON (UpdGrp.ID = UpdItm.groupid)
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) AS dummy
GROUP BY issuenum
ORDER BY issuenum
This returns the first 2 columns I'm looking for but I'm struggling to work out how to return the final column as when I include that in the first line I get an error saying "Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
I've done a search on here and can't find anything that exactly matches my requirements.
Use window functions:
SELECT i.*
FROM (SELECT i.IssueId, cg.Created as Created, ui.NEWVALUE,
ROW_NUMBER() OVER (PARTITION BY i.IssueId ORDER BY cg.Created DESC) as seqnum
FROM ISSUE i JOIN
ChangeGroup cg
ON cg.IssueID = CR.ID JOIN
CHANGEITEM ci
ON cg.ID = ci.groupid
WHERE UPPER(UpdItm.FIELD) = UPPER('ONE')
) i
WHERE seqnum = 1
ORDER BY issueid;

How to join transactional data with customer data tables and perform case-based operations in SQL

I'm trying to perform a query between two different tables and come up with a case by case scenario, coming up with a list of records of calls for a specific month.
Here are my tables:
Customer table:
+----+----------------+------------+
| id | name | number |
+----+----------------+------------+
| 1 | John Doe | 8973221232 |
| 2 | American Dad | 7165531212 |
| 3 | Michael Clean | 8884731234 |
| 4 | Samuel Gatsby | 9197543321 |
| 5 | Mike Chat | 8794029819 |
+----+----------------+------------+
Transaction data:
+----------+------------+------------+----------+---------------------+
| trans_id | incoming | outgoing | duration | date_time |
+----------+------------+------------+----------+---------------------+
| 1 | 8973221232 | 9197543321 | 64 | 2018-03-09 01:08:09 |
| 2 | 3729920490 | 7651113929 | 276 | 2018-07-20 05:53:10 |
| 3 | 8884731234 | 8973221232 | 382 | 2018-05-02 13:12:13 |
| 4 | 8973221232 | 9234759208 | 127 | 2018-07-07 15:32:30 |
| 5 | 7165531212 | 9197543321 | 852 | 2018-08-02 07:40:23 |
| 6 | 8884731234 | 9833823023 | 774 | 2018-07-03 14:27:52 |
| 7 | 8273820928 | 2374987349 | 120 | 2018-07-06 05:27:44 |
| 8 | 8973221232 | 9197543321 | 79 | 2018-07-30 12:51:55 |
| 9 | 7165531212 | 7651113929 | 392 | 2018-05-22 02:27:38 |
| 10 | 5423541524 | 7165531212 | 100 | 2018-07-21 22:12:20 |
| 11 | 9197543321 | 2983479820 | 377 | 2018-07-20 17:46:36 |
| 12 | 8973221232 | 7651113929 | 234 | 2018-07-09 03:32:53 |
| 13 | 7165531212 | 2309483932 | 88 | 2018-07-16 16:22:21 |
| 14 | 8973221232 | 8884731234 | 90 | 2018-09-03 13:10:00 |
| 15 | 3820838290 | 2093482348 | 238 | 2018-04-12 21:59:01 |
+----------+------------+------------+----------+---------------------+
What am I trying to accomplish?
I'm trying to compile a list of "costs" for each of the customers that made calls on July 2018. The costs are based on:
1) If the customer received a call (incoming), the cost of the call is equal to the duration;
2) if the customer made a call (outgoing), the cost of the call is 100 if the call is 30 or less in duration. If it exceeds 30 duration, then the cost is 100 plus 5 * duration of the exceeded period.
If the customer didn't make any calls during that month he shouldn't be on the list.
Examples:
1) Customer American Dad has 3 incoming calls and 1 outgoing call, however only trans_id 10 and 13 are for the month of July. He should be paying a total of 538:
for trans_id 10 = 450 (100 for the first 30s + 5 * 70 for the remaining)
for trans_id 13 = 88
2) Customer Samuel Gatsby has 1 incoming call and 3 outgoing calls, however only trans_id 8 and 11 are for the month of July. He should be paying a total of 722:
for trans_id 8 = 345 (100 for the first 30s + 5 * 49 for the remaining)
for trans_id 11 = 377
Considering only these two examples, the output would be:
+----+----------------+------------+------------+
| id | name | number | billable |
+----+----------------+------------+------------+
| 2 | American Dad | 7165531212 | 538 |
| 4 | Samuel Gatsby | 9197543321 | 722 |
+----+----------------+------------+------------+
Note: Mike Chat shouldn't be on the list as he didn't make or receive any calls for that specific month.
What have I tried so far?
I've been playing cat and mouse with this one, I'm using the number as uniqueID, already attempted both a full outer join and combining where incoming or outgoing is not null then applying rules by case, tried doing a left join and applying cases, but I'm circling around and I can't get to a final list. Whenever I get incoming or outgoing, I'm either not able to apply the case or not able to come with both together. Really appreciate the help!
select customer_name.name, customer_name.number, bill = (CASE
WHEN customer_name.number = transaction_data.incoming then 'sum bill'
else 'multiply and add'
end)
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Note: I'm using sqlite to try it out online but the database is on SQL Server 2012, so I know that I can use a date format much easier, that way, but I'd like to keep as close to T-SQL as possible.
Also tried creating a case to determine whether it's incoming call or outgoing, but I'm only getting incoming as a result, even though trans_id 10 is outgoing:
select name, number, duration, case
when customer_name.number = transaction_data.incoming then 'incoming'
when customer_name.number = transaction_data.outgoing then 'outgoing'
END direction
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Try this:
SELECT
c."name", c.number,
SUM(CASE c.number
WHEN t.incoming THEN t.duration
ELSE IIF(t.duration - 30 < 0, 0, t.duration - 30) * 5 + 100
END) AS billable
FROM Customer AS c INNER JOIN [Transaction] AS t
ON c.number IN(t.incoming, t.outgoing)
WHERE t.date_time >= '20180701' AND t.date_time < '20180801'
GROUP BY c."name", c.number
Output:
| name | number | billable |
+---------------+------------+----------+
| John Doe | 8973221232 | 440 |
| American Dad | 7165531212 | 538 |
| Michael Clean | 8884731234 | 774 |
| Samuel Gatsby | 9197543321 | 722 |
Test it online with SQL Fiddle.

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

Aggregate function across two tables

I need for further working routine a query which calculates several functions across two (maybe more) tables. But once I import more than one table I got odd results caused by JOIN conditions. First I used that query:
SELECT
sum(s.bedarf2050_kwh_a) AS bedarf_kWh_a,
sum(s.bedarf2050_kwh_a)*0.2 AS netzverlust,
sum(s.bedarf2050_kwh_a) + sum(s.bedarf2050_kwh_a)*0.2 AS gesamtbedarf,
sum(pv.modulflaeche_qm) AS instbar_modulflaeche_qm
FROM
siedlungsareale_wbm s, pv_st_potenziale_gis pv
WHERE
s.vg_solar LIKE '%NWS 2%'
AND
ST_Covers(s.geom, pv.geom);
Using sum with DISTINCT returns some accurate values but only if all input values are unique. That's not a solution I can use:
SELECT
SUM(DISTINCT s.bedarf2050_kwh_a) AS bedarf_kWh_a,
SUM(DISTINCT s.bedarf2050_kwh_a)*0.2 AS netzverlust,
SUM(DISTINCT s.bedarf2050_kwh_a) + SUM(DISTINCT s.bedarf2050_kwh_a)*0.2 AS gesamtbedarf,
SUM(pv.modulflaeche_qm) AS instbar_modulflaeche_qm,
(SUM(DISTINCT s.bedarf2050_kwh_a) + SUM(DISTINCT s.bedarf2050_kwh_a)*0.2)*0.01499 AS startwert_speichergroesse
FROM
siedlungsareale_wbm s, pv_st_potenziale_gis pv
WHERE
pv.vg_solar LIKE '%NWS 2%'
AND
ST_Covers(s.geom, pv.geom);
DISTINCT would be a proper solution if the DISTINCT refers to another column, not the column to use in the function. Or some subquery or other JOIN condition. But all I tried run in errors or false result values.
I found some solutions using UNION dealing with aggregate function on multiple tables. But as I tried to fit the code on my query I got errors.
For example like there:
Can SQL calculate aggregate functions across multiple tables?
Hope someone can help me to build a working query for my task.
[EDIT] simple example
siedlungsareale
id | bedarf2050_kWh_a | a | b | c | vg_solar | geom
---|------------------|---|---|---|----------|-----
1 | 20 | | | | NWS 2 | xxxxx
2 | 10 | | | | NWS 2 | xxxxx
3 | 30 | | | | NWS 2 | xxxxx
4 | 5 | | | | NWS 2 | xxxxx
5 | 15 | | | | NWS 2 | xxxxx
sum = 80
pv_st_potenziale_gis
id | modulflaeche_qm | x | y | z | geom
---|------------------|---|---|---|---------
1 | 10 | | | | xxxxx
2 | 10 | | | | xxxxx
3 | 20 | | | | xxxxx
4 | 10 | | | | xxxxx
5 | 30 | | | | xxxxx
6 | 30 | | | | xxxxx
7 | 10 | | | | xxxxx
8 | 10 | | | | xxxxx
9 | 10 | | | | xxxxx
10 | 10 | | | | xxxxx
sum = 140
SELECT sum(s.bedarfxxxx) AS bedarf, sum(pv.mflaeche) As mflaeche
FROM siedlungsareale s, pv_st_potenziale_gis pv
WHERE s.vg_solar LIKE '%NWS 2%' AND ST_Covers(s.geom,pv.geom);
Expected correct result:
bedarf | mflaeche
---------|----------
80 | 140
There I would get the sum of all values for column 'bedarf' from 'siedlungsareale' and all for 'mflaeche' from 'pv_st_potenziale_gis'
But the real calculated values of column 'bedarf' using this query are much higher caused of the CROSS JOIN condition.
And the other query:
SELECT sum(DISTINCT s.bedarfxxxx) AS bedarf, sum(DISTINCT pv.mflaeche) As mflaeche
FROM siedlungsareale s, pv_st_potenziale_gis pv
WHERE s.vg_solar LIKE '%NWS 2%' AND ST_Covers(s.geom,pv.geom);
returns:
bedarf | mflaeche
---------|-----------
80 | 60
Accurate value for 'bedarf' caused the values are unique. But for mflaeche where some values occurre several times the result is wrong.

Simple psql count query

I am very new to postgresql and would like to generate some summary data from our table
We have a simple message board - table name messages which has an element ctg_uid. Each ctg_uid corresponds to a category name in the table categories.
Here are the categories select * from categories ORDER by ctg_uid ASC;
ctg_uid | ctg_category | ctg_creator_uid
---------+--------------------+-----------------
1 | general | 1
2 | faults | 1
3 | computing | 1
4 | teaching | 2
5 | QIS-FEEDBACK | 3
6 | QIS-PHYS-FEEDBACK | 3
7 | SOP-?-CHANGE | 3
8 | agenda items | 7
10 | Acq & Process | 2
12 | physics-jobs | 3
13 | Tech meeting items | 12
16 | incident-forms | 3
17 | ERRORS | 3
19 | Files | 10
21 | QIS-CAR | 3
22 | doses | 4
24 | admin | 3
25 | audit | 3
26 | For Sale | 4
31 | URGENT-REPORTS | 4
34 | dt-jobs | 3
35 | JOBS | 3
36 | IN-PATIENTS | 4
37 | Ordering | 4
38 | dep-meetings | 4
39 | reporting | 4
What I would like to do is for all messages on our messages is count the frequency of each category
I can do it on a category by category basis
SELECT count(msg_ctg_uid) FROM messages where msg_ctg_uid='13';
However is it possible to do this in a one liner?
The following gives the the category and ctg_uid for each message
SELECT ctg_category, msg_ctg_uid FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
but SELECT ctg_category, count(msg_ctg_uid) FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
gives me the error ERROR: column "categories.ctg_category" must appear in the GROUP BY clause or be used in an aggregate function
How do I aggregate the frequency of each category ?
You're missing the group by clause:
SELECT ctg_category, count(msg_ctg_uid)
FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
GROUP BY ctg_category
this means you want the count per ctg_category