SQL count row where progres not done - sql

Need some help, dont know the keyword of this problem to search online
I want to count some progres that isn't done
when progres has step3 then its not counted
desired result from that example is 2, im trying to do it alone, and it doesnt work
help is needed, Thanks Ahead

One method uses count(distinct) and filters in the where clause:
select count(distinct progres)
from t
where not exists (select 1 from t t2 where t2.progres = t.progres and t2.step = 'step3');
Another fun way uses a difference:
select count(distinct progres) - count(distinct case when step = 'step3' then progres end)
from t;
If 'step3' can appear at most once per progres, the above can be simplified to:
select count(distinct progres) - sum(step = 'step3')
from t;
Or using set operations:
select count(*)
from ((select progres from t)
except -- removes duplicates
(select progres from t where step = 'step3')
) t;

You can do:
select
count(distinct progress)
-
(select count(*) from t where step = 'step3')
from t

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

Impala SQL Query

Error Message :
select list expression not produced by aggregation output (missing
from GROUP BY clause?): CASE WHEN (flag = 1) THEN date_add(lead_ctxdt,
-1) ELSE ctx_date END lot_endt
code :
select c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt, min(c.ctx_date) as lot_stdt,
case when (flag = 1 ) then date_add(lead_ctxdt, -1)
else ctx_date
end as lot_endt
from
(
select p.*,
case when (ctx_regimen <> lead_ctx) then 1
else 0
end as flag
from
(
select a.*, lead(a.ctx_regimen, 1) over(partition by enrolid order by ctx_date) as lead_ctx,
lead(ctx_date, 1) over (partition by enrolid order by ctx_date) as lead_ctxdt
from
(
select enrolid, ctx_date, group_concat(distinct ctx_codes) as ctx_regimen
from lotinfo
where ctx_date between ctx_date and date_add(ctx_date, 5)
group by enrolid, ctx_date
) as a
) as p
) as c
group by c.enrolid, c.ctx_date, c.ctx_regimen, c.lead_ctx, c.lead_ctxdt
I want to get the lead_ctx date minus one as the date when the flag is 1
So i found the answer by executing a couple of times the minor changes. Let me tell you, that when you are trying to min or max alongside you have group_conact in the same query then in Impala this doesn't work. You have to write it in two queries per one more sub query and the min() of something in the outer query or vice versa.
Thank you #dnoeth for letting me understand I have the answer with me already.

BigQuery use the where clause to filter on a column that not always exists in the table

I need to create some kind of a uniform query for multiple tables. Some tables contain a certain column with a type. If this is the case, I need to apply filtering to it. I don't know how to do this.
I have for example two tables
table_customer_1
CustomerId, CustomerType
1, 1
2, 1
3, 2
Table_customer_2
Customerid
4
5
6
The query needs to be something like the one below and should work for both tables (the table name wil be replaced by the customer that uses the query):
With input1 as(
SELECT
(CASE WHEN exists(customerType) THEN customerType ELSE "0" END) as customerType, *
FROM table_customer_1)
SELECT * from input1
WHERE customerType != 2
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE SAFE_CAST(IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') AS INT64) != 2
or as a simplification you can ignore casting to INT64 and use comparison to STRING
#standardSQL
SELECT *
FROM `project.dataset.table` t
WHERE IFNULL(JSON_EXTRACT_SCALAR(TO_JSON_STRING(t), '$.CustomerType'), '0') != '2'
above will work for whatever table you put instead of project.dataset.table: either project.dataset.table_customer_1 or project.dataset.table_customer_2 - so quite generic I think
I can think of no good reason for doing this. However, it is possible by playing with the scoping rules for subqueries:
SELECT t.*
FROM (SELECT t.*,
(SELECT customerType -- will choose from tt if available, otherwise x
FROM table_customer_1 tt
WHERE tt.Customerid = t.Customerid
) as customerType
FROM (SELECT t.* EXCEPT (Customerid)
FROM table_customer_1 t
) t CROSS JOIN
(SELECT 0 as customerType) x
) t
WHERE customerType <> 2

Avoiding aggregation when selecting values from tables

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!
Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;
To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

Fetch unique combinations of two field values

Probably it has been asked before but I cannot find an answer.
Table Data has two columns:
Source Dest
1 2
1 2
2 1
3 1
I trying to come up with a MS Access 2003 SQL query that will return:
1 2
3 1
But all to no avail. Please help!
UPDATE: exactly, I'm trying to exclude 2,1 because 1,2 already included. I need only unique combinations where sequence doesn't matter.
For Ms Access you can try
SELECT DISTINCT
*
FROM Table1 tM
WHERE NOT EXISTS(SELECT 1 FROM Table1 t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source)
EDIT:
Example with table Data, which is the same...
SELECT DISTINCT
*
FROM Data tM
WHERE NOT EXISTS(SELECT 1 FROM Data t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source)
or (Nice and Access Formatted...)
SELECT DISTINCT *
FROM Data AS tM
WHERE (((Exists (SELECT 1 FROM Data t WHERE tM.Source = t.Dest AND tM.Dest = t.Source AND tm.Source > t.Source))=False));
your question is asked incorrectly. "unique combinations" are all of your records. but i think you mean one line per each Source. so it is:
SELECT *
FROM tab t1
WHERE t1.Dest IN
(
SELECT TOP 1 DISTINCT t2.Dest
FROM tab t2
WHERE t1.Source = t2.Source
)
SELECT t1.* FROM
(SELECT
LEAST(Source, Dest) AS min_val,
GREATEST(Source, Dest) AS max_val
FROM table_name) AS t1
GROUP BY t1.min_val, t1.max_val
Will return
1, 2
1, 3
in MySQL.
To eliminate duplicates, "select distinct" is easier than "group by":
select distinct source,dest from data;
EDIT: I see now that you're trying to get unique combinations (don't include both 1,2 and 2,1). You can do that like:
select distinct source,dest from data
minus
select dest,source from data where source < dest
The "minus" flips the order around and eliminates cases where you already have a match; the "where source < dest" keeps you from removing both (1,2) and (2,1)
Use this query :
SELECT distinct * from tabval ;