Multiple JOIN and GROUP BY: What is the issue with my query? - sql

If I run my query like this:
select
Sys.1,Sys.2,Sys.3,
DDD.a,DDD.b,DDD.c,
Gen.x,Gen.y,Gen.z
from sys_table Sys
join ddd_table DDD on (Sys.3=DDD.a)
join gen_table Gen on (DDD.a=Gen.x)
where
Sys.1 = 'string'
AND Sys.2 = 1
AND Sys.3 = 1
GROUP BY a
I get an error 'Ambiguous column name 'a'.
If I specify the table like here:
select
Sys.1,Sys.2,Sys.3,
DDD.a,DDD.b,DDD.c,
Gen.x,Gen.y,Gen.z
from sys_table Sys
join ddd_table DDD on (Sys.3=DDD.a)
join gen_table Gen on (DDD.a=Gen.x)
where
Sys.1 = 'string'
AND Sys.2 = 1
AND Sys.3 = 1
GROUP BY DDD.a
I get the error: Column 'sys_table.1' is invalid in the select list because it is not contained either an aggregate function or the GROUP BY clause.
What am I missing?

Sam yi is correct. The group by must be by all columns that are not part of an aggregate. Some engines allow this loosely to just grab the first instance of the column from whatever table if it is not an aggregate or part of the group by. Since you don't even have any aggregates, you can just apply a "max()" against each one, and if the values are never changing and would be the same across the board, should be no impact.
Also, having column names as just a number (I didnt even think was allowed), is IMO a very bad thing to deal with, but believe this is only because you are hiding actual content in relation to the REAL problem you are encountering.
select
DDD.a,
max( Sys.1 ) as `1`,
max( Sys.2 ) as `2`,
max( Sys.3 ) as `3`,
max( DDD.b ) as b,
max( DDD.c ) as c,
max( Gen.x ) as x,
max( Gen.y ) as y,
max( Gen.z ) as z
from
sys_table Sys
join ddd_table DDD on (Sys.3 = DDD.a)
join gen_table Gen on (DDD.a = Gen.x)
where
Sys.1 = 'string'
AND Sys.2 = 1
AND Sys.3 = 1
GROUP BY
DDD.a

The first error appears probably because the field name "a" exists in more than one tables. In the second query you need to group with all the fields you are selecting (actually you don't need to group at all, there's no meaning to group to all your fields since you don't use an aggregate function).
Hence, your select will be:
select
Sys.1,Sys.2,Sys.3,DDD.a,DDD.b,DDD.c,Gen.x,Gen.y,Gen.z
from
sys_table Sys
join ddd_table DDD on (Sys.3=DDD.a)
join gen_table Gen on (DDD.a=Gen.x)
where
Sys.1 = 'string' AND Sys.2 = 1 AND Sys.3 = 1

Related

How to pivot two rows into two columns

I have the following SQL Query:
select
distinct
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
from
Equipment,
Studies,
Equipment_Reserved
where
Studies.Study = 'MAINT19-01'
and
Equipment.idEquipment = Equipment_Reserved.Equipment_idEquipment
and
Studies.idStudies = Equipment_Reserved.Studies_idStudies
and
Equipment.Type = 'Probe'
This query produces the following results:
Equipment_Attached_To Name
2297 R1-P1
2297 R1-P2
2299 R1-P3
I would like to change it to the following:
Equipment_Attached_To Name1 Name2
2297 R1-P1 R1-P2
2299 R1-P3 NULL
Thanks for your help!
I'd first change your query from the old, legacy JOIN syntax to an explicit join as it makes the query easier to understand:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
I don't think you actually need a PIVOT - I think you can do this with a nested query with the ROW_NUMBER function. I've seen that PIVOT queries often have worse query execution plans than nested-queries.
Let's add ROW_NUMBER (which require an ORDER BY as it's a windowing-function) and a matching ORDER BY in the whole query to make it consistent). Let's also use PARTITION BY so it resets the row-number for each Equipment_Attached_To value:
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
ORDER BY
Equipment_Attached_To,
[Name]
This will give output like this:
Equipment_Attached_To Name RowNumber
2297 R1-P1 1
2297 R1-P2 2
2299 R1-P3 1
This can then be split out into explicit columns like so below. The use of MAX() is arbitrary (we could use MIN() instead) and only because we're dealing with a GROUP BY and because the CASE WHEN... restricts the input set to just 1 row anyway.
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
-- the query from above
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
So the final query is:
SELECT
Equipment_Attached_To,
MAX( CASE WHEN RowNumber = 1 THEN [Name] END ) AS Name1,
MAX( CASE WHEN RowNumber = 2 THEN [Name] END ) AS Name2
FROM
(
SELECT
DISTINCT
Equipment_Reserved.Equipment_Attached_To,
Equipment.Name,
ROW_NUMBER() OVER (PARTITION BY Equipment_Attached_To ORDER BY [Name]) AS RowNumber
FROM
Equipment
INNER JOIN Equipment_Reserved ON Equipment_Reserved.Equipment_idEquipment = Equipment.idEquipment
INNER JOIN Studies ON Studies.idStudies = Equipment_Reserved.Studies_idStudies
WHERE
Studies.Study = 'MAINT19-01'
AND
Equipment.Type = 'Probe'
)
GROUP BY
Equipment_Attached_To
ORDER BY
Equipment_Attached_To,
Name1,
Name2
Let's start with some basics.
To facilitate reading the code, I added alias to the tables using their initials.
Then, I converted the old join syntax which is partly deprecated to use the standard syntax since 1992 (27 years and people still use the old syntax).
Finally, since there are only 2 possible values, we can use MIN and MAX to separate them in 2 columns.
And because we're using aggregate functions, we remove the DISTINCT and use GROUP BY
The code now looks like this:
SELECT er.Equipment_Attached_To,
--Gets the first row for the id
MIN( e.Name) AS Name1,
--If the MAX is equal to the MIN, returns a NULL. If not, it returns the second value.
NULLIF( MAX(e.Name), MIN( e.Name)) AS Name2
FROM Equipment e
JOIN Studies s ON s.idStudies = er.Studies_idStudies
JOIN Equipment_Reserved er ON e.idEquipment = er.Equipment_idEquipment
WHERE s.Study = 'MAINT19-01'
AND e.Type = 'Probe'
GROUP BY er.Equipment_Attached_To;

Avoiding aggregation when selecting values from tables

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!
Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;
To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

SQL Error - Column does not exist (in SELECT as)

I am joining two tables: breeds + breed_characteristics (bc)
But I'm getting the following error:
PG::UndefinedColumn: ERROR: column "val" does not exist LINE 11
I'm not sure what's wrong, here is my SQL:
SELECT
breeds.*,
CASE bc.user_val
WHEN NULL THEN bc.value
ELSE (bc.value + (bc.user_val/2))/2
END AS val
FROM
breed_characteristics bc
INNER JOIN breeds ON breeds.id = bc.breed_id
WHERE bc.characteristic_id = 45
AND val BETWEEN 4 AND 5
ORDER BY val DESC
(Executing this query on Postgres through Active Record)
You can't use expression alias val in where clause like that.
It's because there is an order in which SQL is executed specified in the SQL standard. Here, the WHERE clause is evaluated before SELECT and hence, the WHERE clause is not aware of the alias you created in the SELECT. The ORDER BY comes after the SELECT and hence can utilize aliases.
Just replace the alias with the actual case expression like this:
SELECT
breeds.*,
CASE bc.user_val
WHEN NULL THEN bc.value
ELSE (bc.value + (bc.user_val/2))/2
END AS val
FROM
breed_characteristics bc
INNER JOIN breeds ON breeds.id = bc.breed_id
WHERE bc.characteristic_id = 45
AND CASE WHEN bc.user_val is NULL THEN bc.value
ELSE (bc.value + (bc.user_val/2))/2
END BETWEEN 4 AND 5
ORDER BY val DESC
However, you can use alias in order by clause.
One option to avoid restating the CASE expression in multiple places is to use a subquery:
SELECT *
FROM
(
SELECT b.*,
bc.characteristic_id,
CASE WHEN bc.user_val IS NULL THEN bc.value
ELSE (bc.value + (bc.user_val / 2)) / 2
END AS val
FROM breed_characteristics bc
INNER JOIN breeds b
ON breeds.id = bc.breed_id
) t
WHERE t.characteristic_id = 45 AND
t.val BETWEEN 4 AND 5
ORDER BY t.val DESC

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

SQL when one column has duplicate rows, then select row where other column is the min value

I have this table
mt.id, mt.otherId, mt.name, mt.myChar, mt.type
1 10 stack U "question"
2 10 stack D
3 30 stack U "question"
4 10 stack U "whydownvotes"
And I want only
rows with id 2 and 3 returned (without using the id, otherid as parameter) and ensuring name and type are matching against parameters. And when there is a duplicate otherId = then return the row with min myChar value. So far I have this :
select mt.* from myTable mt
where (mt.myChar = 'U' AND (mt.name = 'stack' AND mt.type LIKE '%question%'))
or (mt.myChar = 'D' and mt.name = 'stack')
So where otherID is 10, I want the row with min char value 'D'. I am going to need a subquery or group using min(myChar) ... ?
How do i remove the first row from the sql fiddle (without using the id):
http://sqlfiddle.com/#!9/c579a/1
edit
Jeepers, whats with the downvotes, its clear question isn't it ? There is even a sql fiddle.
If this is SQL Server, then you can do it in two steps like this:
WITH filtered AS (
SELECT
mt.*,
minType = MIN(mt.type) OVER (PARTITION BY mt.otherId)
FROM
dbo.myTable AS mt
WHERE (mt.myChar = 'U' AND mt.name = 'stack' AND mt.type LIKE '%question%')
OR (mt.myChar = 'D' AND mt.name = 'stack')
)
SELECT
id,
otherId,
name,
myChar,
type
FROM
filtered
WHERE
type = minType
;
The filtered subquery is basically your current query but with an additional column that holds minimum type values per otherId. The main query filters the filtered set further based on the type = minType condition.
I am assuming you want is a groupwise maximum, one of the most commonly-asked SQL questions You can try as , This query should work on any DBMS. But If you are using the SQL SERVER then you can use the Row_Number() which is very easy to use.Here myTable is your table.
SELECT t0.*
FROM myTable AS t0
LEFT JOIN myTable AS t1 ON t0.otherId = t1.otherId AND t1.myChar < t0.myChar
WHERE t1.myChar IS NULL;
Here is sql fiddle