Avoiding aggregation when selecting values from tables - sql

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!

Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;

To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

Related

postgres: COUNT, DISTINCT is not implemented for window functions

I am trying to use COUNT(DISTINC column) OVER(PARTITION BY column) when I am using COUNT + window function(OVER).
I get an error like the one in the title and can't get it to work.
I have looked into how to deal with this error, but I have not found an example of how to deal with such a complex query as the one below.
I cannot find an example of how to deal with such a complex query as shown below, and I am not sure how to handle it.
The COUNT part of the problem exists on line 65.
How can such a complex query be resolved without slowing down?
WITH RECURSIVE "cte" AS((
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"videos_productvideocomment"."id" AS "root_id"
FROM
"videos_productvideocomment"
WHERE
(
"videos_productvideocomment"."parent_id" IS NULL
AND "videos_productvideocomment"."video_id" = 'f264433c-c0af-49cc-8b40-84453da71b2d'
)
) UNION(
SELECT
"videos_productvideocomment"."id",
"videos_productvideocomment"."user_id",
"videos_productvideocomment"."video_id",
"videos_productvideocomment"."parent_id",
"videos_productvideocomment"."text",
"videos_productvideocomment"."commented_at",
"videos_productvideocomment"."edited_at",
"videos_productvideocomment"."created_at",
"videos_productvideocomment"."updated_at",
"cte"."root_id" AS "root_id"
FROM
"videos_productvideocomment"
INNER JOIN
"cte"
ON "videos_productvideocomment"."parent_id" = "cte"."id"
))
SELECT
*,
EXISTS(
SELECT
(1) AS "a"
FROM
"videos_productvideolikecomment" U0
WHERE
(
U0."comment_id" = t."id"
AND U0."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48'
)
LIMIT 1
) AS "liked"
FROM
(
SELECT DISTINCT
"cte"."id",
"cte"."created_at",
"cte"."updated_at",
"cte"."user_id",
"cte"."text",
"cte"."commented_at",
"cte"."edited_at",
"cte"."parent_id",
"cte"."video_id",
"cte"."root_id" AS "root_id",
COUNT(DISTINCT "cte"."root_id") OVER(PARTITION BY "cte"."root_id") AS "reply_count", <--- here
COUNT("videos_productvideolikecomment"."id") OVER(PARTITION BY "cte"."id") AS "liked_count"
FROM
"cte"
LEFT OUTER JOIN
"videos_productvideolikecomment"
ON (
"cte"."id" = "videos_productvideolikecomment"."comment_id"
)
) t
WHERE
t."id" = t."root_id"
ORDER BY
CASE
WHEN t."user_id" = '3bd3bc86-0335-481e-9fd2-eb2fb1168f48' THEN 0
ELSE 1
END ASC,
"liked_count" DESC
DISTINCT will look for duplicates and remove it, but in big data it will take a lot of time to process this query, you should process the middle of the record in the programming part I think it will be fast than. Thank

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!
I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

Count for a list of items with zero for does not exist

If I have a table t1 with:
my_col
------
foo
foo
bar
And I have a list with foo and hello
How can I get:
my_col | count
-------|-------
foo | 2
hello | 0
If I just do
SELECT my_col, COUNT(*)
FROM t1
WHERE my_col in ('foo', 'hello')
GROUP BY my_col
I get
my_col | count
-------|------
foo | 2
without any value for hello.
I'm specifically wanting this to be in reference to a list of items because this will be called in a program where the list is a variable.
Ideally you should maintain a separate table with all the possible column values which you want to appear in your report. In the absence of that, we can try using a CTE here:
WITH cte AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'bar' UNION ALL
SELECT 'hello'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM cte a
LEFT JOIN t1 b
ON a.my_col = b.my_col
WHERE
a.my_col IN ('foo', 'hello')
GROUP BY
a.my_col;
Demo
Here's yet another way, using values:
select
t2.my_col, count (t1.my_col)
from
(values ('foo'), ('hello')) as t2 (my_col)
left join t1 on t1.my_col = t2.my_col
group by
t2.my_col
Note that count (t1.my_col) returns a 0 for "hello" since nulls are not counted. count (*) by contast would have returned 1 for "hello" because it was counting the row.
You can turn your list into a set of rows and use a LEFT JOIN, like :
SELECT x.val, COUNT(t.my_col)
FROM
(SELECT 'foo' val UNION SELECT 'hello') x
LEFT JOIN t ON t.my_col = x.val
GROUP BY x.val
Postgres solution:
One way is to place the 'list' into an ARRAY, and then convert the ARRAY into a column using unnest. Then perform a left join on that column with the other table and perform a count.
WITH t1 AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'foo' UNION ALL
SELECT 'bar'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM unnest(ARRAY['foo', 'hello']) a (my_col)
LEFT JOIN t1 b
ON a.my_col = b.my_col
GROUP BY
a.my_col;
The issue I had with the other answers is that (while they they helped me get to the solution) they did not provide a solution where the items of interest were in a single list (which isn't an actual sql term, so the fault is on me).
However, my real use case is to perform a native query using java and hibernate, and unfortunately the above does not work because the typing cannot be determined. Instead I converted my list into a single string and used string_to_array in place of the ARRAY function.
So the solution that worked best for my use case is below (but at this point, the other answers would be just as correct since I'm now having to do manual string manipulation, but I'm leaving this here for the sake of posterity)
WITH t1 AS (
SELECT 'foo' AS my_col UNION ALL
SELECT 'foo' UNION ALL
SELECT 'bar'
)
SELECT
a.my_col,
COUNT(b.my_col) AS count
FROM unnest(string_to_array('foo, hello', ',')) a (my_col)
LEFT JOIN t1 b
ON a.my_col = b.my_col
GROUP BY
a.my_col;

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

SQL ROW_NUMBER OVER syntax

I have this:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured
on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND RowNumber = 1
ORDER BY homepagefeatured.orderby
It fails on "Invalid column name RowNumber"
Not sure how to prefix it to access it?
Thanks
You can't reference the field like that. You can however use a subquery or a common-table-expression:
Here's a subquery:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
) T
WHERE RowNumber = 1
ORDER BY orderby
Rereading your query, since you aren't partitioning by any fields, the order by at the end is useless (it contradicts the order of the ranking function). You're probably better off using top 1...
Using top:
SELECT top 1 vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
ORDER BY homepagefeatured.orderby
(EDIT: This answer is a nearby pitfall. I leave it for documentation.)
Have a look here: Referring to a Column Alias in a WHERE Clause
This is the same situation.
It is a matter of how the sql query is parsed/compiled internally, so your field alias names are not known at the time the where clause is interpreted. Therefore you might try, in reference to the example above:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid, vwmain.manufacturer,vwmain.model,vwmain.derivative, vwmain.isspecial,vwmain.created,vwmain.updated,vwmain.stocklevel, vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND ROW_NUMBER() OVER (ORDER BY vwmain.ch) = 1
ORDER BY homepagefeatured.orderby
Thus you see your expression in the select statement is exactly reused in the where clause.