formula for computed column based on different table's column - sql

Consider this table: c_const
code | nvalue
--------------
1 | 10000
2 | 20000
and another table t_anytable
rec_id | s_id | n_code
---------------------
2 | x | 1
The goal is to have s_id be a computed column, based on this formula:
rec_id*(select nvalue from c_const where code=ncode)
This produces an error:
Subqueries are not allowed in this context. Only scalar expressions are allowed.
How can I calculate the value for this computed column using another table's column as an input?

You could create a user-defined function for this:
CREATE FUNCTION dbo.GetValue(#ncode INT, #recid INT)
RETURNS INT
AS
SELECT #recid * nvalue
FROM c_const
WHERE code = #ncode
and then use that to define your computed column:
ALTER TABLE dbo.YourTable
ADD NewColumnName AS dbo.GetValue(ncodeValue, recIdValue)

This seems to be more of a job for views (indexed views, if you need fast lookups on the computed column):
CREATE VIEW AnyView
WITH SCHEMABINDING
AS
SELECT a.rec_id, a.s_id, a.n_code, a.rec_id * c.nvalue AS foo
FROM AnyTable a
INNER JOIN C_Const c
ON c.code = a.n_code
This has a subtle difference from the subquery version in that it would return multiple records instead of producing an error if there are multiple results for the join. But that is easily resolved with a UNIQUE constraint on c_const.code (I suspect it's already a PRIMARY KEY).
It's also a lot easier for someone to understand than the subquery version.
You can do it with a subquery and UDF as marc_s has shown, but that's likely to be highly inefficient compared to a simple JOIN, since a scalar UDF will need to be computed row-by-row.

Related

Bigquery UDF to repeat queries. Error : Scalar subquery cannot have more than one column

I am trying to get unique values from multiple columns but since the datastructure is an array I can't directly do DISTINCT on all columns. I am using UNNEST() for each column and performing a UNION ALL for each column.
My idea is to create a UDF so that I can simply give the column name each time instead of performing the select every time.
I would like to replace this Query with a UDF since there are many feature columns and I need to do many UNION ALL.
SELECT DISTINCT user_log as unique_value,
'user_log' as feature
FROM `my_table`
left join UNNEST(user_Log) AS user_log
union all
SELECT DISTINCT page_name as unique_value,
'user_login_page_name' as feature
FROM `my_table`
left join UNNEST(PageName) AS page_name
order by feature;
My UDF
CREATE TEMP FUNCTION get_uniques(feature_name ARRAY<STRING>, feature STRING)
AS (
(SELECT DISTINCT feature as unique_value,
'feature' as feature
FROM `my_table`
left join UNNEST(feature_name) AS feature));
SELECT get_uniques(user_Log, log_feature);
However the UDF to select the column doesnt really work and gives the error
Scalar subquery cannot have more than one column unless using SELECT AS STRUCT to build STRUCT values; failed to parse CREATE [TEMP] FUNCTION statement at [8:1]
There is probably a better way of doing this. Appreciate your help.
By reading what are you trying to achieve, which is:
My idea is to create a UDF so that i can simply give the column name each time instead of performing the select every time.
One approach could be to use format in combination with execution immediate to create your custom query and get the desirable output.
Below example shows the function using format to return a custom query and execute immediate to retrieve the final query output from the final table. I'm using a public data set so you can also try it out on your side:
CREATE TEMP FUNCTION GetUniqueValues(table_name STRING, col_name STRING, nest_col_name STRING)
AS (format("SELECT DISTINCT %s.%s as unique_val,'%s' as featured FROM %s ", col_name,nest_col_name,col_name,table_name));
EXECUTE IMMEDIATE (
select CONCAT(
(SELECT GetUniqueValues('bigquery-public-data.github_repos.commits','Author','name'))
,' union all '
,(SELECT GetUniqueValues('bigquery-public-data.github_repos.commits','Committer','name'))
,' limit 100'))
output
Row | unique_val | featured
1 | Sergio Garcia Murillo | Committer
2 | klimek | Committer
3 | marclaporte#gmail.com | Committer
4 | acoul | Committer
5 | knghtbrd | Committer
... | ... | ...
100 | Gustavo Narea | Committer

PostgreSQL: How to select on non-aggregating column?

Seems like a simple question but I'm having trouble accomplishing it. What I want to do is return all names that have duplicate ids. The view looks as such:
id | name | other_col
---+--------+----------
1 | James | x
2 | John | x
2 | David | x
3 | Emily | x
4 | Cameron| x
4 | Thomas | x
And so in this case, I'd just want the result:
name
-------
John
David
Cameron
Thomas
The following query works but it seems like an overkill to have two separate selects:
select name
from view where id = ANY(select id from view
WHERE other_col='x'
group by id
having count(id) > 1)
and other_col='x';
I believe it should be possible to do something under the lines of:
select name from view WHERE other_col='x' group by id, name having count(id) > 1;
But this returns nothing at all! What is the 'proper' query?
Do I just have to it like my first working suggestion or is there a better way?
You state you want to avoid two "queries", which isn't really possible. There are plenty of solutions available, but I would use a CTE like so:
WITH cte AS
(
SELECT
id,
name,
other_col,
COUNT(name) OVER(PARTITION BY id) AS id_count
FROM
table
)
SELECT name FROM cte WHERE id_count > 1;
You can reuse the CTE, so you don't have to duplicate logic and I personally find it easier to read and understand what it is doing.
SELECT name FROM Table
WHERE id IN (SELECT id, COUNT(*) FROM Table GROUP BY id HAVING COUNT(*)>1) Temp
Use EXIST operator
SELECT * FROM table t1
WHERE EXISTS(
SELECT null FROM table t2
WHERE t1.id = t2.id
AND t1.name <> t2.name
)
Use a join:
select distinct name
from view v1
join view v2 on v1.id = v2.id
and v1.name != v2.name
The use of distinct is there in case there are more than 2 rows sharing the same id. If that's not possible, you can omit distinct.
A note: Naming a column id when it's not unique will likely cause confusion, because it's the industry standard for the unique identifier column. If there isn't a unique column at all, it will cause coding difficulties.
Do not use a CTE. That's typically more expensive because Postgres has to materialize the intermediary result.
An EXISTS semi-join is typically fastest for this. Just make sure to repeat predicates (or match the values):
SELECT name
FROM view v
WHERE other_col = 'x'
AND EXISTS (
SELECT 1 FROM view
WHERE other_col = 'x' -- or: other_col = v.other_col
AND id <> v.id -- exclude join to self
);
That's a single query, even if you see the keyword SELECT twice here. An EXISTS expression does not produce a derived table, it will be resolved to simple index look-ups.
Speaking of which: a multicolumn index on (other_col, id) should help. Depending on data distribution and access patterns, appending the payload column name to enable index-only scans might help: (other_col, id, name). Or even a partial index, if other_col = 'x' is a constant predicate:
CREATE INDEX ON view (id) WHERE other_col = 'x';
PostgreSQL does not use a partial index
The upcoming Postgres 9.6 would even allow an index-only scan on the partial index:
CREATE INDEX ON view (id, name) WHERE other_col = 'x';
You will love this improvement (quoting the /devel manual):
Allow using an index-only scan with a partial index when the index's
predicate involves column(s) not stored in the index (Tomas Vondra,
Kyotaro Horiguchi)
An index-only scan is now allowed if the query mentions such columns
only in WHERE clauses that match the index predicate
Verify performance with EXPLAIN (ANALYZE, TIMING OFF) SELECT ...
Run a couple of times to rule out caching effects.

PostgreSQL - Check foreign key exists when doing a SELECT

Suppose I have the following data:
Table some_table:
some_table_id | value | other_table_id
--------------------------------------
1 | foo | 1
2 | bar | 2
Table other_table:
other_table_id | value
----------------------
1 | foo
2 | bar
Here, some_table has a foreign key to column other_table_id from other_table into the column of some name.
With the following query in PostgreSQL:
SELECT *
FROM some_table
WHERE other_table_id = 3;
As you see, 3 does not exists in other_table This query obviously will return 0 results.
Without doing a second query, is there a way to know if the foreign key that I am using as a filter effectively does not exist in the other_table?
Ideally as an error that later could be parsed (as it happends when doing an INSERT or an UPDATE with a wrong foreign key, for example).
You can exploit a feature of PL/pgSQL to implement this very cheaply:
CREATE OR REPLACE FUNCTION f_select_from_some_tbl(int)
RETURNS SETOF some_table AS
$func$
BEGIN
RETURN QUERY
SELECT *
FROM some_table
WHERE other_table_id = $1;
IF NOT FOUND THEN
RAISE WARNING 'Call with non-existing other_table_id >>%<<', $1;
END IF;
END
$func$ LANGUAGE plpgsql;
A final RETURN; is optional in this case.
The WARNINGis only raised if your query doesn't return any rows. I am not raising an ERROR in the example, since this would roll back the whole transaction (but you can do that if it fits your needs).
We've added a code example to the manual with Postgres 9.3 to demonstrate this.
If you perform an INSERT or UPDATE on some_table, specifying an other_table_id value that does not in fact exist in other_table, then you will get an error arising from violation of the foreign key constraint. SELECT queries are therefore your primary concern.
One way you could address the issue with SELECT queries would be to transform your queries to perform an outer join with other_table, like so:
SELECT st.*
FROM
other_table ot
LEFT JOIN some_table st ON st.other_table_id = ot.other_table_id
WHERE st.other_table_id = 3;
That query will always return at least one row if any other_table row has other_table_id = 3. In that case, if there is no matching some_table row, then it will return exactly one row, with that row having all columns NULL (given that it selects only columns from some_table), even columns that are declared not null.
If you want such queries to raise an error then you'll probably need to write a custom function to assist, but it can be done. I'd probably implement it in PL/pgSQL, using that language's RAISE statement.

How to use the same function like Oracle Rownum in MS ACCESS

I am encountering a problem, I had done a function that the data can be loaded by detecting scrolling position, the function was made with a SQL statement "Rownum", it only works in Oracle, but not in ACCESS.
I would like to query the data and resort it
ID value
1 aa
3 bb
with Rownum we can do like this
NID ID value
1 1 aa
2 3 bb
how can I write a SQL statement with Microsoft ACCESS
Access does not support that function. If your ID field is a numeric primary key, you can include a field expression which is the count of the number of rows with ID <= to the current ID value.
SELECT
DCount('*', 'YourTable', 'ID <= ' & y.ID) AS NID,
y.ID,
y.value
FROM YourTable AS y;
You could use a correlated subquery instead of DCount if you prefer.
And ID does not actually have to be a primary key. If it has a unique constraint it is still suitable for this purpose.
And the targeted field does not absolutely have to be a number, but text data type can be more challenging.

How to access columns on a cursor which is a join on all elements of two tables in Oracle PL/SQL

I am trying to run a cursor on full join of two tables but having problem accessing the columns in cursor.
CREATE TABLE APPLE(
MY_ID VARCHAR(2) NOT NULL,
A_TIMESTAMP TIMESTAMP,
A_NAME VARCHAR(10)
);
CREATE TABLE BANANA(
MY_ID VARCHAR(2) NOT NULL,
B_TIMESTAMP TIMESTAMP,
B_NAME VARCHAR(10)
);
I have written a Full join to return all related rows from tables A and B where any of the two timestamps are in future.
i.e. if a row in table APPLE has timestamp in future then fetch row from APPLE joined with row from BANANA on MY_ID
Similarly, if a row in table BANANA has timestamp in future then fetch row from BANANA joined with row from APPLE on MY_ID
This full join works for me.
select * from APPLE a full join BANANA b on a.MY_ID = b.MY_ID where
(
a.A_TIMESTAMP > current_timestamp
or b.B_TIMESTAMP > current_timestamp
);
Now I want to iterate over each joined record and do some processing. I am able to access the columns which are only present in one tables but getting error when trying to access the column names which are same in both tables. For ex. ID in this case.
create or replace
PROCEDURE testProc(someDate IN DATE)
AS
CURSOR c1 IS
select * from APPLE a full join BANANA b on a.MY_ID = b.MY_ID where
(
a.A_TIMESTAMP > current_timestamp
or b.B_TIMESTAMP > current_timestamp
);
BEGIN
FOR rec IN c1
LOOP
DBMS_OUTPUT.PUT_LINE(rec.A_NAME);
DBMS_OUTPUT.PUT_LINE(rec.A_TIMESTAMP);
DBMS_OUTPUT.PUT_LINE(rec.MY_ID);
END LOOP;
END testProc;
I get this error when I compile the above proc:
Error(16,28): PLS-00302: component 'MY_ID' must be declared
and I am not sure how would I access the MY_ID element. I am sure it will be pretty
straight forward but I am new to database programming and have been trying but unable to find the right way to do it.
Any help is appreciated.
Thanks
One other thing you can do in this case is to join the tables with the USING clause instead of using ON, as in:
select *
from APPLE a
full join BANANA b
USING (MY_ID)
where a.A_TIMESTAMP > current_timestamp or
b.B_TIMESTAMP > current_timestamp
USING can be used if the columns on both tables have the same name, and the comparison of the key values is made using the equality ('=') operator. In the result set there will be one column named MY_ID along with the other columns from both table (A_TIMESTAMP, B_TIMESTAMP, etc).
Share and enjoy.
I assume the problem is that MY_ID is defined in both tables, so * gets both of them. Try defining the cursor using this query:
select coalesce(A.MY_ID, B.MY_ID) as MY_ID,
A_TIMESTAMP, A_NAME, B_TIMESTAMP, B_NAME
from APPLE a full join
BANANA b
on a.MY_ID = b.MY_ID
where a.A_TIMESTAMP > current_timestamp or b.B_TIMESTAMP > current_timestamp;
EDIT:
You have two issues with conflicting columns. If this were just an inner join, you could do:
select A.*, B_TIMESTAMP, B_NAME
That is, you can select the columns from one table using * and the rest individually. However, this is a full outer join, so there is a set of columns where you want to use coalesce().
So, the best answer is that you should list out all the columns. This is good coding practice anyway, and helps protect the code from inadvertent mistakes when columns are added or removed from the table.