I asked this question yesterday on how to calculate the number of customers in a grid square, and the solution I got was:
SELECT 10 * (customer_x / 10), 10 * (customer_y / 10), COUNT (*) FROM t_customer GROUP BY customer_x / 10, customer_y / 10 ORDER BY 3 DESC;
Now I need to present the grid squares that contain zero customers, and I'm not sure how to do this as the query is based on calculations of the usual numbers and the grid doesn't actually exist in the table. Should I use an ISNULL() function? The results for this query in my current database are:
90|90|7
30|20|4
-20|-40|2
-10|-20|2
-10|-10|2
-40|-40|1
-40|-30|1
-40|30|1
-30|0|1
-20|0|1
-20|30|1
-10|-30|1
-10|40|1
0|-20|1
0|-10|1
0|0|1
0|10|1
0|40|1
10|20|1
20|20|1
30|-40|1
30|30|1
But given that there are 100 grid squares in the area there are many without customers. I just need a query that will show all but the above grid squares. I'm using SQLite3 and any help would be greatly appreciated. Those in the (90,90) grid square are to be ignored.
Sample data:
I need to have a list of grid squares with no customers in from the 100 grid squares in a 10 x 10 grid (-50 to +50 in increments of 10). It may be easier to use a virtual table of all possible grid squares and to subtract the above query from it?
To make an (empty) grid square show up, add a dummy row for it, with a customer_id of NULL. NULL values are not counted:
SELECT ...x...,
...y...,
COUNT(customer_id)
FROM t_customer
GROUP BY ...x..., ...y...;
If you cannot change the database, you have to generate the empty rows with a recursive common table expression:
WITH RECURSIVE range(i) AS (
-- from -90 to +90 in steps of 10
SELECT -90
UNION ALL
SELECT i + 10
FROM range
LIMIT 19
), empty(x, y, customer_id) AS (
SELECT x.i,
y.i,
NULL
FROM range AS x
CROSS JOIN range AS y
)
SELECT ...x...,
...y...,
COUNT(customer_id)
FROM (SELECT * FROM t_customer
UNION ALL
SELECT * FROM empty)
GROUP BY ...x..., ...y...;
Related
I have a case like the following picture
Say I have 9 polygons, and want to get a polygon that is maximum neighbors with 3 other polygons such as polygons 1, 3, 7, 9 (yellow)
I think this is done using ST_Touches in postgis, but I just come up with represent it in postgis code like
select a.poly_name, b.poly_name from tb a, tb b where ST_Touches(a.geom, b.geom)
And say I want to output this like:
poly_name poly_name
1 2
1 4
1 5
So how I get idea to done with this?
Your hint with ST_Touches is correct, however to get the amount of neighbor cells from one column related to other records in the same table you either need to run a subquery or call the table twice in the FROM clause.
Given the following grid on a table called tb ..
.. you can filter the cells with three neighbor cells or less like this:
SELECT * FROM tb q1
WHERE (
SELECT count(*) FROM tb q2
WHERE ST_Touches(q2.geom,q1.geom)) <=3;
If you want to also list which are the neighbor cells you might wanna first join the cells that touch in the WHERE clause and in a subquery or CTE count the results:
WITH j AS (
SELECT
q1.poly_name AS p1,q2.poly_name p2,
COUNT(*) OVER (PARTITION BY q1.poly_name) AS qt
FROM tb q1, tb q2
WHERE ST_Touches(q2.geom,q1.geom))
SELECT * FROM j
WHERE qt <= 3;
Demo: db<>fiddle
Further reading:
Create Hexagons (maybe relevant for your project)
Window Functions
I have a table match which looks like this (please see attached image). I wanted to retrieve a dataset that had a column of average values for home_goal and away_goal using this code
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
AVG(m.home_goal + m.away_goal) AS avg_goal
FROM match AS m;
However, I got this error
column "m.country_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 3: m.country_id,
My question is: why was GROUP BY clause required? Why couldn't SQL know how to take average of two columns row by row?
Thank you.
try this:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal)/2 AS avg_goal
FROM match AS m;
You have been asked for the group_by as avg() much like sum() work on multiple values of one column where you classify all columns that are not a columns wise operation in the group by
You are looking to average two distinct columns - it is a row-wise operations instead of column-wise
how to take average of two columns row by row?
You don't use AVG() for this; it is an aggregate function, that operates over a set of rows. Here, it seems like you just want a simple math computation:
SELECT
m.country_id,
m.season,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) / 2.0 AS avg_goal
FROM match AS m;
Note the decimal denominator (2.0): this avoids integer division in databases that implement it.
Avg in the context of the function mentioned above is calculating the average of the values of the columns and not the average of the two values in the same row. It is an aggregate function and that’s why the group by clause is required.
In order to take the average of two columns in the same row you need to divide by 2.
Let's consider the following table:
CREATE TABLE Numbers([x] int, [y] int, [category] nvarchar(10));
INSERT INTO Numbers ([x], [y], [category])
VALUES
(1, 11, 'odd'),
(2, 22, 'even'),
(3, 33, 'odd'),
(4, 44, 'even');
Here is an example of using two aggregate functions - AVG and SUM - with GROUP BY:
SELECT
Category,
AVG(x) as avg_x,
AVG(x+y) as avg_xy,
SUM(x) as sum_x,
SUM(x+y) as sum_xy
FROM Numbers
GROUP BY Category
The result has two rows:
Category avg_x avg_xy sum_x sum_xy
even 3 36 6 72
odd 2 24 4 48
Please note that Category is available in the SELECT part because the results are GROUP BY'ed by it. If a GROUP BY is not specified then the result would be 1 row and Category is not available (which value should be displayed if we have sums and averages for multiple rows with different caetories?).
What you want is to compute a new column and for this you don't use aggregate functions:
SELECT
(x+y)/2 as avg_xy,
(x+y) as sum_xy
FROM Numbers
This returns all rows:
avg_xy sum_xy
6 12
12 24
18 36
24 48
If your columns are integers don't forget to handle rounding, if needed. For example (CAST(x AS DECIMAL)+y)/2 as avg_xy,
The simple arithmetic calculation:
(m.home_goal + m.away_goal) / 2.0
is not exactly equivalent to AVG(), because NULL values mess it up. Databases that support lateral joins provide a pretty easy (and efficient) way to actually use AVG() within a row.
The safe version looks like:
(coalesce(m.home_goal, 0) + coalesce(m.away_goal, 0)) /
nullif( (case when m.home_goal is not null then 1 else 0 end +
case when m.away_goal is not null then 1 else 0 end
), 0
)
Some databases have syntax extensions that allow the expression to be simplified.
I have a table with the following fields
ID,Content,QuestionMarks,TypeofQuestion
350, What is the symbol used to represent Bromine?,2,MCQ
758,What is the symbol used to represent Bromine? ,2,MCQ
2425,What is the symbol used to represent Bromine?,3,Essay
2080,A quadrilateral has four sides, four angles ,1,MCQ
2614,A circular cone has a curved surface area of ,2,MCQ
2520,Two triangles have sides 5 cm, 11 cm, 2 cm . ,2,MCQ
2196,Life supporting process mediated by water? ,2,Essay
I would like to get random questions where total marks is an input number.
For example if I say 25, the result should be all the random questions whose Sum(QuestionMarks) is 25(+/-1)
Is this really possible using a SQL
select content,id,questionmarks,sum(questionmarks) from quiz_question
group by content,id,questionmarks;
Expected Input 25
Expected Result (Sum of Question Marks =25)
Update:
How do I ensure I get atleast 2 Essay Type Questions (this is just an example) I would extend this for other conditions. Thank you for all the help
S-Man's cumulative sum is the right approach. For your logic, though, I think you want to get up to the first row that is 24 or more. That logic is:
where total - questionmark < 24
If you have enough questions, then you could get exactly 25 using:
with q25 as (
select *
from (select t.*,
sum(questionmark) over (order by random()) as running_questionmark
from t
) t
where running_questionmark < 25
)
select q.ID, q.Content, q.QuestionMarks, q.TypeofQuestion
from q25 q
union all
(select t.ID, t.Content, t.QuestionMarks, t.TypeofQuestion
from t cross join
(select sum(questionmark) as questionmark_25 from q25) x
where not exists (select 1 from q25 where q25.id = t.id)
order by abs(questionmark - (25 - questionmark_25))
limit 1
)
This selects questions up to 25 but not at 25. It then tries to find one more to make the total 25.
Supposing, questionmark is of type integer. Then you want to get some records in random order whose questionmark sum is not more than 25:
You can use the consecutive SUM() window function. The order is random. The consecutive SUM() adds every current value to the previous sum. So, you could filter where SUM() <= <your value>:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
SUM(questionmark) OVER (ORDER BY random()) as total
FROM
t
)s
WHERE total <= 25
Note:
This returns a records list with no more than 25, but as close as possible to it with an random order.
To find an exact match of your value is some sort of combinatorical problem which shouldn't be solved in a database. Especially when there's a random factor. What if your current SUM is 22 and the next randomly chosen value is 4. Would you retry maybe until infinity to randomly find a value = 3? Or are you trying to remove an already counted record with value = 1?
I have a query based on basic criteria that will return X number of records on any given day.
I'm trying to check the result of the basic query then apply a percentage split to it based on the total of X and split it in 2 buckets. Each bucket will be a percentage of the total query result returned in X.
For example:
Query A returns 3500 records.
If the number of records returned from Query A is <= 3000, then split the 3500 records into a 40% / 60% split (1,400 / 2,100).
If the number of records returned from Query A is >=3001 and <=50,000 then split the records into a 10% / 90% split.Etc. Etc.
I want the actual records returned, and not just the math acting on the records that returns one row with a number in it (in the column).
I'm not sure how you want to display different parts of the resulting set of rows, so I've just added additional column(part) in the resulting set of rows that contains values 1 indicating that row belongs to the first part and 2 - second part.
select z.*
, case
when cnt_all <= 3000 and cnt <= 40
then 1
when (cnt_all between 3001 and 50000) and (cnt <= 10)
then 1
else 2
end part
from (select t.*
, 100*(count(col1) over(order by col1) / count(col1) over() )cnt
, count(col1) over() cnt_all
from split_rowset t
order by col1
) z
Demo #1 number of rows 3000.
Demo #2 number of rows 3500.
For better usability you can create a view using the query above and then query that view filtering by part column.
Demo #3 using of a view.
I have a process that needs to select rows from a Table (queued items) each row has a quantity column and I need to select rows where the quantities add to a specific multiple. The mulitple is the order of between around 4, 8, 10 (but could in theory be any multiple. (odd or even)
Any suggestions on how to select rows where the sum of a field is of a specified multiple?
My first thought would be to use some kind of MOD function which I believe in SQL server is the % sign. So the criteria would be something like this
WHERE MyField % 4 = 0 OR MyField % 8 = 0
It might not be that fast so another way might be to make a temp table containing say 100 values of the X times table (where X is the multiple you are looking for) and join on that