SQLite: distinguish between table and column alias - sql

Can SQLite distinguish between a column from some aliased table, e.g. table1.column and a column that is aliased with the same name, i.e. column, in the SELECT statement?
This is relevant because I need to refer to the column that I construct in the SELECT statement later on in a HAVING clause, but must not confuse it with the column in aliased table. To my knowledge, I cannot alias the table to be constructed in my SELECT statement (without reverting to some nasty work-around like SELECT * FROM (SELECT ...) AS alias) to ensure both are distinguishable.
Here's a stripped down version of the code I am concerned with:
SELECT
a.entity,
b.DATE,
TOTAL(a.dollar_amount*b.ret_usd)/TOTAL(a.dollar_amount) AS ret_usd
FROM holdings a
LEFT JOIN returns b
ON a.stock = b.stock AND
a.DATE = b.DATE
GROUP BY
a.entity,
b.DATE
HAVING
ret_usd NOT NULL
Essentially, I want to get rid of groups for which I cannot find any returns and thus would show up with NULL values. I am not using an INNER JOIN because in my production code I merge multiple types of returns - for some of which I may have no data. I only want to drop those groups for which I have no returns for any of the return types.
To my understanding, the SQLite documentation does not address this issue.

LEFT JOIN all the return tables, then add a WHERE something like
COALESCE(b.ret_used, c.ret_used, d.ret_used....) is not NULL
You might need a similar strategy to determine which ret_used in the TOTAL. FYI, TOTAL never returns NULL.

Related

SQL: Is the newly selected temp table in FROM clause not passed to the sub-query in WHERE clause?

here's a question I'm in trouble with. Basically, there are originally two tables: "a" and "b". I firstly "joined" (without using JOIN clause) them together with some conditions: "a.id=b.id", "b.class="xxx"". Then I name that temp table as A, and want to select the data with the highest income within the people in A.
The error returns "the relation A doesn't exist." And the error arrow turns to the clause "select max(A.income) from A". Therefore, I suspect that the temp table A created in FROM clause will not be passed to the sub-query in WHERE clause?
select * from
(select * from a,b where a.id=b.id and b.class='xxx') as A
where A.income = all
(select max(A.income) from A)
I've encountered this problem while using Postgres, but I think it may also happen in other languages like MYSQL or MSSQL. Are there any possible solutions to solve that? Without using WITH clause? Thanks. (The reason why I say "sub-query" instead of "query" is because I've tried terms like "where A.income>1000" and they all work)
The problem is that your alias a hides the table with the same name. Use a different alias name.
It is unclear whether you want to select from the original table a in the subquery or from the alias. If it is the former, then the above will solve your problem.
If you want to reference the alias in the subquery, you had better use a common table expression:
WITH alias_name AS (/* your FROM subquery */)
SELECT ... /* alias_name can be used in a subquery here */
You can try the below -
select * from a join b on a.id=b.id where b.class='xxx'
and income = all (select max(income) from a join b on a.id=b.id where b.class='xxx')

What table will be taken if table alias is not provided?

I am analyzing one SQL statement and there is something with regard to aliases what isn't really clear to me, so I would like to ask if someone can try to explain it. So, this is how the statement looks like
SELECT
a.RecordID
, a.Account
, b.RecordID
, c.SomeField as AlternativeFieldName
FROM TableA a
LEFT JOIN TableB b
ON a.RecordID=b.RecordID
LEFT JOIN TableC c
ON b.RecordID=c.RecordID
WHERE a.DayFrom >= YYYYMMDD and a.DayFrom < YYYYMMDD
AND b.Field1 is null
AND Field2 = 'SOME_VALUE'
as you can see aliases are provided for all three tables in the statement and used always in data selection as well as joins, however in the where clause there is one field from one of the tables above for which an alias is not provided. I wonder it this is correct and if it is what does SQL take as a source table or does it throw an error if it is not?
On this page, I've tried something similar and it actually worked, although I've expected some error. I thought that SQL forces you to used aliases if joining multiple tables. Here is the statement
SELECT *
FROM Customers c
JOIN Orders o
ON c.CustomerID = o.CustomerID
WHERE OrderDate = '1996-07-04'
Thx in advance, cheers!
If SQL Server (or any database) finds an unqualified column name, then it looks to see which tables/subqueries in the FROM clause or outer queries might be providing it.
If the column is in exactly one table/subquery in the FROM clause, then the column is assumed to come from the table/subquery.
If the column is in multiple tables/subqueries in the FROM clause, then the query returns an error.
If the column does not exist in the FROM clause nor in any outer queries, the query returns an error.
If the column does not exist in the FROM clause, but does in an outer query, then that reference is used.
These rules go by the name "scoping". That is a common term in computer languages for figuring out the value of a variable.
No error will be raised for unqualified column names as long as the column name is not ambiguous. In the case of ambiguous names, the column name must be qualified with the table name or alias to avoid an error.
Note there is also the notion of scope that can become an issue when column names are not qualified. Consider this construct:
WHERE b.Field3 IN(
SELECT Field4
FROM TableD
)
If Field4 exists in TableD, the desired results will be returned. But if Field4 does not exist in TableD but exists in one of the outer tables, the predicate will be true for all outer rows when the Field4 value is not NULL and at least one row exists in TableD.
In short, the best practice is to qualify column names in multi-table queries.
Any RDBMS executing the sql queries with joins will try to find column names from all referenced tables. If it finds only one column with that name simply executes the query otherwise throughs ambiguity error. Alias names will be used to avoid ambiguity.

How to drop one join key when joining two tables

I have two tables. Both have lot of columns. Now I have a common column called ID on which I would join.
Now since this variable ID is present in both the tables if I do simply this
select a.*,b.*
from table_a as a
left join table_b as b on a.id=b.id
This will give an error as id is duplicate (present in both the tables and getting included for both).
I don't want to write down separately each column of b in the select statement. I have lots of columns and that is a pain. Can I rename the ID column of b in the join statement itself similar to SAS data merge statements?
I am using Postgres.
Postgres would not give you an error for duplicate output column names, but some clients do. (Duplicate names are also not very useful.)
Either way, use the USING clause as join condition to fold the two join columns into one:
SELECT *
FROM tbl_a a
LEFT JOIN tbl_b b USING (id);
While you join the same table (self-join) there will be more duplicate column names. The query would make hardly any sense to begin with. This starts to make sense for different tables. Like you stated in your question to begin with: I have two tables ...
To avoid all duplicate column names, you have to list them in the SELECT clause explicitly - possibly dealing out column aliases to get both instances with different names.
Or you can use a NATURAL join - if that fits your unexplained use case:
SELECT *
FROM tbl_a a
NATURAL LEFT JOIN tbl_b b;
This joins on all columns that share the same name and folds those automatically - exactly the same as listing all common column names in a USING clause. You need to be aware of rules for possible NULL values ...
Details in the manual.

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

Use two DISTINCT statements in SQL

I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below:
However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode.
and
Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.
Here's my code :
select DISTINCT
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2."DYNDOM_CONFORMERID",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2."DYNDOM_ChainID",
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END"
from
cath_dyndom_table_2
where
pdbcode = '2hun'
order by
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END";
In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.
Thank you.
UPDATE :
This is my VIEW table that i have done.
CREATE VIEW cath_dyndom_table AS
SELECT
r.domainid AS "DYNDOM_DOMAINID",
r.DomainStart AS "DYNDOM_DSTART",
r.Domain_End AS "DYNDOM_DEND",
r.ddid AS "DYN_DDID",
r.confid AS "DYNDOM_CONFORMERID",
r.pdbcode,
r.chainid AS "DYNDOM_ChainID",
d.cath_pdbcode,
d.cathbegin AS "CATH_BEGIN",
d.cathend AS "CATH_END"
FROM dyndom_domain_table r
FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode
ORDER BY confid ASC;
What you are getting is the cartesian product of the ´two tables`.
In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.
You can see HERE what are cartesian joins and HERE how to avoid them!
It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:
SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH
This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).
EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:
FULL OUTER JOIN cath_domains d
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
r.DomainStart <= d.cathbegin AND
r.Domain_End >= d.cathend