Aggregate function across two tables - sql

I need for further working routine a query which calculates several functions across two (maybe more) tables. But once I import more than one table I got odd results caused by JOIN conditions. First I used that query:
SELECT
sum(s.bedarf2050_kwh_a) AS bedarf_kWh_a,
sum(s.bedarf2050_kwh_a)*0.2 AS netzverlust,
sum(s.bedarf2050_kwh_a) + sum(s.bedarf2050_kwh_a)*0.2 AS gesamtbedarf,
sum(pv.modulflaeche_qm) AS instbar_modulflaeche_qm
FROM
siedlungsareale_wbm s, pv_st_potenziale_gis pv
WHERE
s.vg_solar LIKE '%NWS 2%'
AND
ST_Covers(s.geom, pv.geom);
Using sum with DISTINCT returns some accurate values but only if all input values are unique. That's not a solution I can use:
SELECT
SUM(DISTINCT s.bedarf2050_kwh_a) AS bedarf_kWh_a,
SUM(DISTINCT s.bedarf2050_kwh_a)*0.2 AS netzverlust,
SUM(DISTINCT s.bedarf2050_kwh_a) + SUM(DISTINCT s.bedarf2050_kwh_a)*0.2 AS gesamtbedarf,
SUM(pv.modulflaeche_qm) AS instbar_modulflaeche_qm,
(SUM(DISTINCT s.bedarf2050_kwh_a) + SUM(DISTINCT s.bedarf2050_kwh_a)*0.2)*0.01499 AS startwert_speichergroesse
FROM
siedlungsareale_wbm s, pv_st_potenziale_gis pv
WHERE
pv.vg_solar LIKE '%NWS 2%'
AND
ST_Covers(s.geom, pv.geom);
DISTINCT would be a proper solution if the DISTINCT refers to another column, not the column to use in the function. Or some subquery or other JOIN condition. But all I tried run in errors or false result values.
I found some solutions using UNION dealing with aggregate function on multiple tables. But as I tried to fit the code on my query I got errors.
For example like there:
Can SQL calculate aggregate functions across multiple tables?
Hope someone can help me to build a working query for my task.
[EDIT] simple example
siedlungsareale
id | bedarf2050_kWh_a | a | b | c | vg_solar | geom
---|------------------|---|---|---|----------|-----
1 | 20 | | | | NWS 2 | xxxxx
2 | 10 | | | | NWS 2 | xxxxx
3 | 30 | | | | NWS 2 | xxxxx
4 | 5 | | | | NWS 2 | xxxxx
5 | 15 | | | | NWS 2 | xxxxx
sum = 80
pv_st_potenziale_gis
id | modulflaeche_qm | x | y | z | geom
---|------------------|---|---|---|---------
1 | 10 | | | | xxxxx
2 | 10 | | | | xxxxx
3 | 20 | | | | xxxxx
4 | 10 | | | | xxxxx
5 | 30 | | | | xxxxx
6 | 30 | | | | xxxxx
7 | 10 | | | | xxxxx
8 | 10 | | | | xxxxx
9 | 10 | | | | xxxxx
10 | 10 | | | | xxxxx
sum = 140
SELECT sum(s.bedarfxxxx) AS bedarf, sum(pv.mflaeche) As mflaeche
FROM siedlungsareale s, pv_st_potenziale_gis pv
WHERE s.vg_solar LIKE '%NWS 2%' AND ST_Covers(s.geom,pv.geom);
Expected correct result:
bedarf | mflaeche
---------|----------
80 | 140
There I would get the sum of all values for column 'bedarf' from 'siedlungsareale' and all for 'mflaeche' from 'pv_st_potenziale_gis'
But the real calculated values of column 'bedarf' using this query are much higher caused of the CROSS JOIN condition.
And the other query:
SELECT sum(DISTINCT s.bedarfxxxx) AS bedarf, sum(DISTINCT pv.mflaeche) As mflaeche
FROM siedlungsareale s, pv_st_potenziale_gis pv
WHERE s.vg_solar LIKE '%NWS 2%' AND ST_Covers(s.geom,pv.geom);
returns:
bedarf | mflaeche
---------|-----------
80 | 60
Accurate value for 'bedarf' caused the values are unique. But for mflaeche where some values occurre several times the result is wrong.

Related

In Oracle SQL how can i find all values in one column for which in another column exist more than one distinct value

I have an Oracle table like this
| id | code | info | More cols |
|----|------|------------------|-----------|
| 1 | 13 | The Thirteen | dggf |
| 1 | 18 | The Eighteen | ghdgffg |
| 1 | 18 | The Eighteen | |
| 1 | 9 | The Nine | ghdfgjgf |
| 1 | 9 | Die Neun | ghdfgjgf |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 2 | The Two | ghfgh |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | el veintisiete | fghfg |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
In this table I want to find all rows with values in column code which have more than one distinct value in the info column. So from the listed rows this would be the values 9 and 27 and the associated rows.
I tried to construct a first query like
SELECT code FROM mytable
WHERE COUNT(DISTINCT info) >1
but I get a "ORA-00934: group function is not allowed here" error. Also I don't know how to express the condition COUNT(DISTINCT info) "with a fixed postcode".
You need having with group by - aggregate functions don't work with where clause
SELECT code
FROM mytable
group by code
having COUNT(DISTINCT info) >1
I would write your query as:
SELECT code
FROM yourTable
GROUP BY code
HAVING MIN(info) <> MAX(info);
Writing the HAVING logic this ways leaves the query sargable, meaning that an index on (code, info) should be usable.
You could also do this using exists logic:
SELECT DISTINCT code
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable WHERE t2.code = t1.code AND t2.info <> t1.info);

SQL - How to transform a table with a range of values into another with all the numbers in that range?

I have a Table (A) with some intervals from start_val to end_val with an attribute for that range of values.
I want a Table (B) in which each row is a number in the interval of start_val to end_val with the attribute of that range.
I need to do that using SQL.
Exemple
Table A:
+---------+--------+----------+
|start_val| end_val| attribute|
+---------+--------+----------+
| 10 | 12 | 1 |
| 20 | 23 | 2 |
+---------+--------+----------+
Table B (Expected result):
+---------+----------+
|start_val| attribute|
|end_val | |
| interv | |
+---------+----------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 20 | 2 |
| 21 | 2 |
| 22 | 2 |
| 23 | 2 |
+---------+----------+
Here is a way to do this
select m.start_val + n -1 as start_val_computed
,m.attribute
from t m
join lateral generate_series(1,(m.end_val-m.start_val)+1) n
on 1=1
+--------------------+-----------+
| start_val_computed | attribute |
+--------------------+-----------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 20 | 2 |
| 21 | 2 |
| 22 | 2 |
| 23 | 2 |
+--------------------+-----------+
working example
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ce9e13765b5a4c3616d95ec659c1dfc9
You may use a calendar table approach:
SELECT
t1.val,
t2.attribute
FROM generate_series(10, 23) AS t1(val)
INNER JOIN TableA t2
ON t1.val BETWEEN t2.start_val AND t2.end_val
ORDER BY
t2.attribute,
t1.val;
Note: You may expand the bounds in the above call to generate_series to cover whatever range you think your data would need.
This is a variant of George's solution, but it is a bit simpler:
select n, m.attribute
from t m cross join lateral
generate_series(m.start_val, m.end_val) n;
The changes are:
CROSS JOIN instead of JOIN. So, no need for an ON clause.
No arithmetic in the GENERATE_SERIES().
No arithmetic in the SELECT.
You can just call the result of GENERATE_SERIES() whatever name you want in the result set.
Postgres actually allows you to put GENERATE_SERIES() in the SELECT:
select generate_series(m.start_val, m.end_val) as n, m.attribute
from t m;
However, I am not a fan of putting row generating functions anywhere other than the FROM clause. I just find it confusing to figure out what the query is doing.

SQL - divide one column by another

I have the following code:
select c.category
,sum(b.is_open) as open
,count(b.name) as total
from business b inner join category c on b.id=c.business_id
group by c.category
order by sum(b.is_open) desc
limit 10
which gives me following dataset:
+------------------+------+-------+
| category | open | total |
+------------------+------+-------+
| Restaurants | 53 | 71 |
| Shopping | 25 | 30 |
| Food | 20 | 23 |
| Health & Medical | 16 | 17 |
| Home Services | 15 | 16 |
| Beauty & Spas | 12 | 13 |
| Nightlife | 12 | 20 |
| Bars | 11 | 17 |
| Active Life | 10 | 10 |
| Local Services | 10 | 12 |
+------------------+------+-------+
However, if I change line 2 and 3 to:
sum(b.is_open) / count(b.name) as '%'
instead of a specific value, I get zeroes all along. I tried to cast both columns to decimal type (although looks like they have been such in the beginning), did not work. Why can't I get the right results? I am writing my queries in SQLite.
Try using floating point arithmetic instead of integer arithmetic:
1.0 * sum(b.is_open) / count(b.name) as '%'

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

SQL query to get the same set of results

This should be a simple one, but say I have a table with data like this:
| ID | Date | Value |
| 1 | 01/01/2013 | 40 |
| 2 | 03/01/2013 | 20 |
| 3 | 10/01/2013 | 30 |
| 4 | 14/02/2013 | 60 |
| 5 | 15/03/2013 | 10 |
| 6 | 27/03/2013 | 70 |
| 7 | 01/04/2013 | 60 |
| 8 | 01/06/2013 | 20 |
What I want is the sum of values per week of the year, showing ALL weeks.. (for use in an excel graph)
What my query gives me, is only the weeks that are actually in the database.
With SQL you cannot return rows that don't exist in some table. To get the effect you want you could create a table called WeeksInYear with only one field WeekNumber that is an Int. Populate the table with all the week numbers. Then JOIN that table to this one.
The query would then look something like the following:
SELECT w.WeekNumber, SUM(m.Value)
FROM MyTable as m
RIGHT OUTER JOIN WeeksInYear AS w
ON DATEPART(wk, m.date) = w.WeekNumber
GROUP BY w.WeekNumber
The missing weeks will not have any data in MyTable and show a 0.