DB2 alias in WHERE clause - sql

I have a couple DB2 tables, one for users and one for newsletters and I want to select using an alias in the WHERE clause.
SELECT a.*, b.tech_id as user FROM users a
JOIN newsletter b ON b.tech_id = a.newsletter_id
WHERE timestamp(user) < current_timestamp
This is radically simplified so I can see what's going on, but I am getting an error that makes me think that the user alias isn't getting passed correctly:
ERROR: An invalid datetime format was detected; that is, an
invalid string representation or value was specified.
The user.tech_id is a string built from the datetime when the record was created, so it looks something like 20150210175040951186000000. I've verified that I can execute a timestamp(tech_id) successfully-- so it can't be the format of the field causing the problem.
Any ideas?
More information:
There's multiple newsletters per user. I need to get the most recent newsletter (by the tech_id) and check if that was created in the past week. So the more complex version would be something like:
SELECT a.*, b.tech_id as user FROM users a
JOIN newsletter b ON b.tech_id = a.newsletter_id
WHERE timestamp(max(user)) < current_timestamp
Is there a way to JOIN only on the most recent record?

The order of execution is different to the order of writing. The FROM & WHERE clauses are executed before the SELECT clause hence the alias does not exist when you are trying to use it.
You would have to "nest" part of the query so that the alias is defined before the where clause. Can be easier in many cases to not use the alias.
try
WHERE timestamp(b.tech_id) < current_timestamp
The generic "order of execution" of SQL clauses is
FROM
JOINs (as part of the from clause)
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Is there a way to JOIN only on the most recent record?
A useful technique for this is using ROW_NUMBER() assuming your DB2 supports it, and would look something like this:
SELECT
a.*
, b.tech_id AS techuser
FROM users a
JOIN (
SELECT
*
, ROW_NUMBER() OVER (ORDER BY timestamp(tech_id) DESC) AS RN
FROM newsletter
) b
ON b.tech_id = a.newsletter_id
AND b.rn = 1
this would give you just one row from newsletter, and using the DESCending order gives you the "most recent" assuming timestamp(tech_id) works as described.

To get most recent newsletter of user, consider ordering the join query, then select top record (in DB2 you would use FETCH FIRST ONLY):
SELECT a.*, b.tech_id as user
FROM users a
INNER JOIN newsletter b ON b.tech_id = a.newsletter_id
ORDER BY b.tech_id
FETCH FIRST 1 ROW ONLY;
Alternatively, you can use a subquery in WHERE clause that aggregates the max user:
SELECT a.*, b.tech_id as user
FROM users a
WHERE b.tech_id IN (
SELECT Max(n.tech_id) as maxUser
FROM users u
INNER JOIN newsletter n ON n.tech_id = u.newsletter_id)
I left out the condition of timestamp(user) < current_timestamp as data stored in database will always be less than current time (i.e., now).

Related

Workaround for a correlated subquery

I need to run the following join without using a correlated subquery, as I am restricted to either using Hive or Presto, both of which fail due to my using a correlated subquery.
I have worked this down to a MWE. I have a table of each user and their 18th birthdays. I have another table of each time each user visited a movie theatre. I want to merge in only the last time a user visited my movie cinema. The code that would work on native SQL is below.
What is the most efficient workaround that does not require me to join every instance of the user visiting the movie theatre (it is far too large).
SELECT
people.*,
tickets.uid
tickets.date
FROM all_customers as people
JOIN tkting as tickets
on people.uid = tickets.uid
and tickets.date = (select
lastvisit.date
from tickets as lastvisit
where
lastvisit.uid = people.uid
and lastvisit.date < people.birthday_18
order by lastvisit.date asc
limit 1)
Instead of this inner query:
SELECT lastvisit.date
...
ORDER BY lastvisit.date ASC
LIMIT 1
you can try with:
SELECT min(lastvisit.date)
...

How to order a recordest returned from a query by a field created within that same query (MS ACCESS)

The Query (For this question I do not think that you need to see schema):
SELECT Agencies.AgencyName, (SELECT DISTINCT MAX(Invoices.CostsTotal) FROM Invoices WHERE Contracts.ContractID = Invoices.ContractID) AS MaxInvoice
FROM Contracts
LEFT JOIN Agencies ON Contracts.AgencyID = Agencies.AgencyID
ORDER BY MaxInvoice DESC;
How do we order the recordset returned from a query by a field created within that same query?
I have seen the function FIELDS(INDEX) ? But this does not exist in access? Also not sure that it would even work. In this instance I want to sort the recordset by the MaxInvoice field.
MS Access prompts me to enter a parameter value for MaxInvoice when I attempt to run this query
You can write parent SELECT which wraps your current SELECT.
Like this:
SELECT * FROM (
SELECT Agencies.AgencyName,
(SELECT DISTINCT MAX(Invoices.CostsTotal) FROM Invoices
WHERE Contracts.ContractID = Invoices.ContractID) AS MaxInvoice
FROM Contracts LEFT JOIN Agencies
ON Contracts.AgencyID = Agencies.AgencyID
) AS ContractsLargestInvoice
ORDER BY ContractsLargestInvoice.MaxInvoice DESC;
Most SQL dialects support the use of aliases in the ORDER BY. But MS Access is further from SQL standards than most databases.
I would suggest you rewrite the query to move Invoices into the FROM clause -- using aggregation to get what you want:
SELECT a.AgencyName, MAX(i.CostsTotal) AS MaxInvoice
FROM (Contracts as c LEFT JOIN
Agencies as a
ON c.AgencyID = a.AgencyID) LEFT JOIN
Invoices as i
ON i.ContractID = c.ContractID
GROUP BY a.AgencyName
ORDER BY MAX(i.CostsTotal) DESC;
It seems strange that you are using a LEFT JOIN and choosing a field from the second table and not the first. This could be NULL.

Subquery that accesses main table fields combined with LIMIT clause in Oracle SQL

I got a table Users and a table Tasks. Tasks are ordered by importance and are assigned to a user's task list. Tasks have a status: ready or not ready. Now, I want to list all users with their most important task that is also ready.
The interesting requirement that the tasks for each user first need to be filtered and sorted, and then the most important one should be selected. This is what I came up with:
SELECT Users.name,
(SELECT *
FROM (SELECT Tasks.description
FROM Tasks
WHERE Tasks.taskListCode = Users.taskListCode AND Tasks.isReady
ORDER BY Tasks.importance DESC)
WHERE rownum = 1
) AS nextTask
FROM Users
However, this results in the error
ORA-00904: "Users"."taskListCode": invalid identifier
I think the reason is that oracle does not support correlating subqueries with more than one level of depth. However, I need two levels so that I can do the WHERE rownum = 1.
I also tried it without a correlating subquery:
SELECT Users.name, Task.description
FROM Users
LEFT JOIN Tasks nextTask ON
nextTask.taskListCode = Users.taskListCode AND
nextTask.importance = MAX(
SELECT tasks.importance
FROM tasks
WHERE tasks.isReady
GROUP BY tasks.id
)
This results in the error
ORA-00934: group function is not allowed here
How would I solve the problem?
One work-around for this uses keep:
SELECT u.name,
(SELECT MAX(t.description) KEEP (DENSE_RANK FIRST ORDER BY T.importance DESC)
FROM Tasks t
WHERE t.taskListCode = u.taskListCode AND t.isReady
) as nextTask
FROM Users u;
Please try with analytic function:
with tp as (select t.*, row_number() over (partition by taskListCode order by importance desc) r
from tasks t
where isReady = 1 /*or 'Y' or what is positive value here*/)
select u.name, tp.description
from users u left outer join tp on (u.taskListCode = tp.taskListCode)
where tp.r = 1;
Here is a solution that uses aggregation rather than analytic functions. You may want to run this against the analytic functions solution to see which is faster; in many cases aggregate queries are (slightly) faster, but it depends on your data, on index usage, etc.
This solution is similar to what Gordon tried to do. I don't know why he wrote it using a correlated subquery instead of a straight join (and don't know if it will work - I've never seen the FIRST/LAST function used with correlated subqueries like that).
It may not work exactly right if there may be NULL in the importance column - then you will need to add nulls first after t.importance and before ). Note: the max(t.description) is needed, because there may be ties by "importance" (two tasks with the same, highest importance for a given user). In that case, one task must be chosen. If the ordering by importance is strict (no ties), then the MAX() does nothing as it selects the MAX over a set of exactly one value, but the compiler doesn't know that beforehand so it does need the MAX().
select u.name,
max(t.description) keep (dense_rank last order by t.importance) as descr
from users u left outer join tasks t on u.tasklistcode = t.tasklistcode
where t.isready = 'Y'
group by u.name

SQL - unknown column in derived table

This is the problematic part of my query:
SELECT
(SELECT id FROM users WHERE name = 'John') as competitor_id,
(SELECT MIN(duration)
FROM
(SELECT duration FROM attempts
WHERE userid=competitor_id ORDER BY created DESC LIMIT 1,1
) x
) as best_time
On execution, it throws this error:
#1054 - Unknown column 'competitor_id' in 'where clause'
It looks like the derived table 'x' can't see the parent's query alias competitor_id. Is there any way how to create some kind of global alias, which will be usable by all derived tables?
I know I can just use the competitor_id query as a subquery directly in a WHERE clause and avoid using alias at all, but my real query is much bigger and I need to use competitor_id in more subqueries and derived tables, so it would be inefficient if I would used the same subquery more times.
you may not need to use derived tables within the select statement, wouldn't the following accomplish the same thing?
SELECT
users.id as competitor_id,
MIN(duration) as best_time
FROM users
inner join attempts on users.id = attempts.user_id
WHERE name = 'John'
group by users.id
There error is caused because a identifier introduced in a select output-clause cannot be referenced from anywhere else in that clause - basically, with SQL, identifiers/columns are pushed out and not down (or across).
But, even if it were possible, it's not good to write a query this way anyway. Use a JOIN between the users and attempts (on user id), then filter based on the name. The SQL query planner will then take the high-level relational algebra and write an efficient plan for it :) Note that there is no need for either a manual ordering or limit here as the aggregate (MIN) over a group handles that.
SELECT u.id, u.name, MIN(a.duration) as duration
FROM users u
-- match up each attempt per user
JOIN attempts a
ON a.userid = u.id
-- only show users with this name
WHERE u.name = 'John'
-- group so we get the min duration *per user*
-- (name is included so it can be in the output clause)
GROUP BY u.id, u.name
Something about your query seems rather strange. The innermost subquery is selecting one row and then you are taking the min(duration). The min is unnecessary, because there is only one row. You can phrase the query as:
SELECT u.id as competitor_id, a.duration as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
order by a.created desc
limit 1, 1;
This seems to be what your query is attempting to do. However, this might not be your intention. It is probably giving the most recent time. (If you are using MySQL, then limit 1, 1 is actually taking the second most recent record). To get the smallest duration (presumably the "best"), you would do:
SELECT u.id as competitor_id, min(a.duration) as best_time
from users u left outer join
attempts a
on u.id = a.userid
where u.name = 'John'
Adding a group by u.id would ensure that this returns exactly one row.

COUNT in a query with multiple JOINS and a GROUP BY CLAUSE

I am working on a database that contains 3 tables:
A list of companies
A table of the products they sell
A table of prices they offered on each date
I'm doing a query like this in my php to generate a list of the companies offering the lowest prices on a certain product type on a certain date.
SELECT
a.name AS company,
c.id,
MIN(c.price) AS apy
FROM `companies` a
JOIN `company_products` b ON b.company_id = a.id
JOIN `product_prices` c ON c.product_id = b.id
WHERE
b.type = "%s"
AND c.date = "%s"
GROUP BY a.id
ORDER BY c.price ASC
LIMIT %d, %d
This gets me the data I need, but in order to implement a pager in PHP I need to know how many companies offering that product on that day there are in total. The LIMIT means that I only see the first few...
I tried changing the SELECT clause to SELECT COUNT(a.id) or SELECT COUNT(DISTINCT(a.id)) but neither of those seem to give me what I want. I tried removing the GROUP BY and ORDER BY in my count query, but that didn't work either. Any ideas?
Looks to me like you should GROUP BY a.id, c.id -- grouping by a.id only means you'll typically have several c.ids per a.id, and you're just getting a "random-ish" one of them. This seems like a question of basic correctness. Once you have fixed that, an initial SELECT COUNT(*) FROM etc etc should then definitely give you the number of rows the following query will return, so you can prepare your pager accordingly.
This website suggests MySQL has a special trick for this, at least as of version 4:
Luckily since MySQL 4.0.0 you can use SQL_CALC_FOUND_ROWS option in your query which will tell MySQL to count total number of rows disregarding LIMIT clause. You still need to execute a second query in order to retrieve row count, but it’s a simple query and not as complex as your query which retrieved the data.
Usage is pretty simple. In you main query you need to add SQL_CALC_FOUND_ROWS option just after SELECT and in second query you need to use FOUND_ROWS() function to get total number of rows. Queries would look like this:
SELECT SQL_CALC_FOUND_ROWS name, email
FROM users
WHERE name LIKE 'a%'
LIMIT 10;
SELECT FOUND_ROWS();
The only limitation is that you must call second query immediately after the first one because SQL_CALC_FOUND_ROWS does not save number of rows anywhere.