SQL cartesian product turns - sql

I am trying to understand the cartesian product with the SELECT command
but when I try different combinations I get different results like when I type
select X.A,Y.A,Z.A
From X,Y,Z
i get XxYxZ
but if i try
select X.A,Y.A,Z.A
From X,Y,Z
where (conditions)
depending on how I put the conditions also I get more different combinations

Depending on the database you are using, you want to fiddle with cross join.
Also, the results vary depending on data, to achieve persistent order you want to use order by clause.

Related

How to write a query that lists the name of someting that appears in more than one category

I'm new so please excuse me if this is a dumb question but i can't figure it out.
Question: Write a query that lists the name of Products that appear in more than one Recipe.
These are the tables
The simplest solution to this question is to use a GROUP BY expression with a HAVING clause to limit the results to groups that have a count of more than 1
SELECT Nome_produto
FROM Receita
LEFT JOIN Ingrediente ON Receita.Codigo_receita = Ingrediente.Codigo_receita
LEFT JOIN Produto ON Ingrediente.Codigo_produto = Produto.Codigo_produto
GROUP BY Nome_produto
HAVING COUNT(Receita.Codigo_receita) > 1
Nome_produto
Ovos
The key to solving an issue like this is to first build the relationship between the entities that the requirement refers to, in this case I'm selecting data FROM the recipe table first, as this is the subject that the principal predicate must be applied to:
...that appear in more than one Recipe.
Once you have a denormalized or flat table of results, we can apply the grouping with a HAVING clause. HAVING is similar to WHERE except that it is evaluated after the grouping and allows us to use the results from aggregate expressions in conditional statements.
WHERE is applied to individual rows and will exclude rows before the grouping sets are realized and therefore before aggregates are evalutated.
Read more about SQL HAVING Clause here # W3 Schools.

Trying to understand how WHERE IN in a subquery works in Teradata SQL?

I'm trying to build a sub-query with a list in the where clause, I have tried several variations and I think the problem is with the way I'm structuring the WHERE IN. Help is grealy appreciated!!
SELECT a.ACCT_SK,
a.BTN,
a.PRODUCT_SET,
MAX(b.ORD_CREATD_DT)
FROM MM.MEC_ACCT_ATTR a, CDI_CRM.ORD_MSTR b
WHERE a.ACCT_SK=b.ACCT_SK AND a.BTN=b.BTN
(SELECT b.ACCT_SK, b.ORD_CREATD_DT
FROM CDI_CRM.ORD_MSTR b
WHERE b.ACCT_SK IN ('44347714',
'44023302',
'43604964'));
SELECT Failed. 3706: (-3706)Syntax error: expected something between '(' and the 'SELECT' keyword
The desired output is a table with Product set for 50 ACCT_SKs with the most recent order date matched on ACCT_SK and BTN.
Sample data and desired results would really help. Your query doesn't make much sense, but I suspect you want:
SELECT a.ACCT_SK, a.BTN, a.PRODUCT_SET,
MAX(o.ORD_CREATD_DT)
FROM MM.MEC_ACCT_ATTR a JOIN
CDI_CRM.ORD_MSTR o
ON a.ACCT_SK = o.ACCT_SK AND a.BTN = o.BTN
WHERE a.ACCT_SK IN ('44347714', '44023302', '43604964')
GROUP BY a.ACCT_SK, a.BTN, a.PRODUCT_SET;
This returns the columns you want for the three specified accounts.
Notes:
Always use proper, explicit, standard JOIN syntax. Never use commas in the FROM clause.
Your subquery simply makes no sense. It is not connected to anything else in the query.
You are using an aggregation function (MAX()) so your query is an aggregation query and needs a GROUP BY.
Use meaningful table aliases. a makes sense for an accounts table, but b does not make sense for an orders table.

Using SELECT DISTINCT or alternative with a 3 table query

Here I have an SQL statement which is retrieving all of the right stuff, but I need it to be DISTINCT.
So, for WEEK_NUMBER its returning week_number = 1,1,1,1,1,1 etc
I want it to condense into 1. It is a 3 table query and I'm not sure how I could include the SELECT DISTINCT feature or an alternative, any ideas??
SELECT WEEKLY_TIMECARD.*,DAILY_CALCULATIONS.*,EMPLOYEE_PROFILES.EMPLOYEE_NUMBER
FROM WEEKLY_TIMECARD, DAILY_CALCULATIONS, EMPLOYEE_PROFILES
WHERE EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = WEEKLY_TIMECARD.EMPLOYEE_NUMBER
AND EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = DAILY_CALCULATIONS.EMPLOYEE_NUMBER
AND WEEKLY_TIMECARD.WEEK_NUMBER = DAILY_CALCULATIONS.WEEK_NUMBER
Try this:
SELECT DISTINCT WEEKLY_TIMECARD.WEEK_NUMBER
FROM
WEEKLY_TIMECARD,
DAILY_CALCULATIONS,
EMPLOYEE_PROFILES
WHERE EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = WEEKLY_TIMECARD.EMPLOYEE_NUMBER
AND EMPLOYEE_PROFILES.EMPLOYEE_NUMBER = DAILY_CALCULATIONS.EMPLOYEE_NUMBER
AND WEEKLY_TIMECARD.WEEK_NUMBER = DAILY_CALCULATIONS.WEEK_NUMBER
you should add GROUP BY WEEK_NUMBER
Since you are showing all the fields from tables WEEKLY_TIMECARD and DAYLY_CALCULATIONS, if you use SELECT DISTINCT... you may end up with exactly the same situation you are encountering now (many rows with the same value).
Besides the DISTINCT and GROUP BY usage, you need to consider the following:
Do yo really need all the fields? If you do, then maybe you do need the duplicate values. If you don't, just include the fields you need.
Do you need to aggregate data? Or you only need to deduplicate the values? If you need to aggregate data, you must use GROUP BY, and the appropriate aggregating functions. If you don't need to aggregate data, I would advise you not to use GROUP BY, because it can make your query to be executed very slowly (it may depend on which RDBMBS you are using).
Whichever solution you choose, be sure your tables are properly indexed.
Besides that, I would use INNER JOIN to explicitly define the relations between your data (rather than implicitly defining them using WHERE conditions)... but that's my personal preference.

Select Statement with Distinct returning multiple rows and need only first result

I having a challenge with my query returning multiple results.
SELECT DISTINCT gpph.id, gpph.cname, gc2a.assetfilename, gpph.alternateURL
FROM [StepMirror].[dbo].[stepview_nwppck_ngn_getpimproducthierarchy] gpph
INNER JOIN [StepMirror].[dbo].[stepview_nwppck_ngn_getclassification2assetrefs] gc2a
ON gpph.id=gc2a.id
WHERE gpph.subtype='Level_4' AND gpph.parentId=#ID AND gc2a.assettype='Primary Image'
A record, 5679599, has 2 'Primary Images' and is returning 2 results for that id but I only need the first result back. Is there any way to do this IN the current query? Do I need to write multiple queries?
I need some direction on how to constrain the results to only 1 result on Primary Image. I have looked at a ton of similar questions but most typically are just requiring the guidance of adding 'distinct' to the beginning of their query rather than on the where clause.
Edit: This problem is created by a user inputting 2 Primary Images on one record in the database. My business requirements only state to take the first result.
Any help would be awesome!
Given the choice is arbitary which to return, we can just use an aggregate on the value. This then needs a group by clause, which eliminates the need for the distinct.
SELECT gpph.id, gpph.cname, max(gc2a.assetfilename), gpph.alternateURL
FROM [StepMirror].[dbo].[stepview_nwppck_ngn_getpimproducthierarchy] gpph
INNER JOIN [StepMirror].[dbo].[stepview_nwppck_ngn_getclassification2assetrefs] gc2a
ON gpph.id=gc2a.id
WHERE gpph.subtype='Level_4' AND gpph.parentId=#ID AND gc2a.assettype='Primary Image'
GROUP BY gpph.id, gpph.cname, gpph.alternateURL
In this instance, using max(gc2a.assetfilename) is going to give you the alphabetically highest value in the event of there being more than one record. It's not the ideal choice, some kind of timestamp knowing the order of the records might be more helpful, since then the meaning of the word 'first' could make more sense.
Replace distinct to group by :
SELECT MAX(gpph.id), gpph.cname, gc2a.assetfilename, gpph.alternateURL
FROM [StepMirror].[dbo].[stepview_nwppck_ngn_getpimproducthierarchy] gpph
INNER JOIN [StepMirror].[dbo].[stepview_nwppck_ngn_getclassification2assetrefs] gc2a
ON gpph.id=gc2a.id
WHERE gpph.subtype='Level_4' AND gpph.parentId=#ID AND gc2a.assettype='Primary Image'
AND gpph.id = MAX(gpph.id)
GROUP BY gpph.cname, gc2a.assetfilename, gpph.alternateURL

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.