Using COUNTIF inside aggregation function - sql

Im making a betting system app. I need to count the points of a user based on his bets, knowing that some of the bets can be 'combined', which makes the calcul a bit more complex than a simple addition.
So if i have 3 bets: {points: 3, combined: false}, {points: 5, combined: true}, {points: 10, combined: true}, there are two bets combined here, so the total points should be 3 + (5 * 2) + (10 * 2). Reality is a bit more complex since the points are not directly in the bet object but in the match it refers to
Here is a part of my query. As you can see, i first check if the bet is right based on the match result, in that case if the bet is combined I multiply it by the value of combinedLength, else i'll just sum the value of that bet. I tried to replicate the COUNTIF inside the CASE, which gaves me an error like 'cannot aggregation inside aggregation'.
SELECT
JSON_EXTRACT_SCALAR(data, '$.userId') AS userId,
COUNTIF(JSON_EXTRACT_SCALAR(data, '$.combined') = 'true') AS combinedLength,
SUM (
(
CASE WHEN JSON_EXTRACT_SCALAR(data, '$.value') = match.result
THEN IF(JSON_EXTRACT_SCALAR(data, '$.combined') = "true", match.odd * combinedLength, match.odd)
ELSE 0
END
)
) AS totalScore,
FROM data.user_bets_raw_latest
INNER JOIN matchLines ON matchLines.match.matchId = JSON_EXTRACT(data, '$.fixtureId')
GROUP BY userId
I've been looking for days... thanks so much for the help !

If I follow you correctly, then you need to count the total number of combined bets per user in a subquery, using a window function. Then, you can use this information while aggregating.
Consider:
select
user_id,
sum(case when combined = 'true' then odd * cnt_combined else odd end) total_score
from (
select
u.*,
m.match.odd,
countif(b.combined = 'true') over(partition by userid) as cnt_combined,
from (
select
json_extract_scalar(data, '$.userid') userid,
json_extract_scalar(data, '$.combined') combined,
json_extract_scalar(data, '$.value') value,
json_extract_scalar(data, '$.fixtureid') fixtureid
from data.user_bets_raw_latest
) b
left join matchlines m
on m.match.matchid = b.fixtureid
and m.match.result = b.value
) t
group by userid
I find that it is simpler to use a left join and put the condition on the match result in there.
I moved the json extractions to a subquery to reduce the length of the query.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

BigQuery use the where clause to filter on a column that not always exists in the table

I need to create some kind of a uniform query for multiple tables. Some tables contain a certain column with a type. If this is the case, I need to apply filtering to it. I don't know how to do this.
I have for example two tables
table_customer_1
CustomerId, CustomerType
1, 1
2, 1
3, 2
Table_customer_2
Customerid
4
5
6
The query needs to be something like the one below and should work for both tables (the table name wil be replaced by the customer that uses the query):
With input1 as(
SELECT
(CASE WHEN exists(customerType) THEN customerType ELSE "0" END) as customerType, *
FROM table_customer_1)
SELECT * from input1
WHERE customerType != 2
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE SAFE_CAST(IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') AS INT64) != 2
or as a simplification you can ignore casting to INT64 and use comparison to STRING
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') != '2'
above will work for whatever table you put instead of project.dataset.table: either project.dataset.table_customer_1 or project.dataset.table_customer_2 - so quite generic I think
I can think of no good reason for doing this. However, it is possible by playing with the scoping rules for subqueries:
SELECT t.*
FROM (SELECT t.*,
(SELECT customerType -- will choose from tt if available, otherwise x
FROM table_customer_1 tt
WHERE tt.Customerid = t.Customerid
) as customerType
FROM (SELECT t.* EXCEPT (Customerid)
FROM table_customer_1 t
) t CROSS JOIN
(SELECT 0 as customerType) x
) t
WHERE customerType <> 2

Avoiding aggregation when selecting values from tables

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!
Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;
To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?