I have this following select query that uses a scalar function to get full name. I want to eliminate the redundancy by using variable but so far there is no success. My query is
select
a.Id,
a.UserName,
getFullName(a.UserName),
a.CreateTime
from DataTable;
I don't want to retrieve 'a.User' two times. I would prefer if I can save a.User in a variable and then pass it to the function hence improving the efficiency.
Currently the work around I came up with is as following
select
Id,
UserName,
getFullName(UserName),
CreateTime
from (select a.Id, a.UserName, a.CreateTime from DataTable) temp
This solves the performance issue but adds the overhead to write same select two time. Any other suggestions would be great.
DataTable looks like this
+----+----------+------------+
| Id | UserName | CreateTime |
+----+----------+------------+
| 1 | ab | 10:00 |
| 2 | cd | 11:00 |
| 3 | ef | 12:00 |
+----+----------+------------+
Here is the NamesTable used to get the full names
+----------+----------+
| UserName | FullName |
+----------+----------+
| ab | Aa BB |
| cd | Cc Dd |
| ef | Ee Ff |
+----------+----------+
Here is the function that gets the full name
Create function [dbo].[getFullName](#user varchar(150)) returns varchar(500)
as
begin
declare #Result varchar(500);
select #Result = FullName from dbo.NamesTable where UserName = #user;
return #Result;
end;
You're solving a problem that doesn't exist. You seem to think that
select
a.Id,
a.UserName,
getFullName(a.UserName),
a.CreateTime
from DataTable;
Has some relatively expensive process behind it to get UserName that is happening twice. In reality, once the record is located, getting the UserName value is an virtually instant process since it will probably be stored in a "variable" by the SQL engine behind the scenes. You should have little to no performance difference between that query and
select
a.Id,
getFullName(a.UserName),
a.CreateTime
from DataTable;
The scalar function itself may have a performance issue, but it's not because you are "pulling" the UserName value "twice".
A better method would be to join to the other table:
select
a.Id,
a.UserName,
b.FullName,
a.CreateTime
from DataTable a
LEFT JOIN dbo.NamesTable b
ON a.UserName = b.UserName
As D Stanley says, you're trying to solve some problem that doesn't exist. I would further add that you shouldn't be using the function at all. SQL is meant to perform set-based operations. When you use a function like that you're now making it perform the same function over and over again for every row - a horrible practice. Instead, just JOIN in the other table (a set-based operation) and let SQL do what it does best:
SELECT
DT.Id,
DT.UserName,
NT.fullname,
DT.CreateTime
FROM
DataTable DT
INNER JOIN NamesTable NT ON NT.username = DT.username;
Also, DataTable and NamesTable are terrible names for tables. Of course they're tables, so there's no need to put "table" on the end of the name. Further, of course the first one holds "data", it's a database. Your table names should be descriptive. What exactly does DataTable hold?
If you're going to be doing SQL development in the future then I strongly suggest that you read several introductory books on the subject and watch as many tutorial videos as you can find.
Scalar UDF will execute for every row,but not defintely the way you think.below is sample demo and execution plan which proves the same..
create table testid
(
id int,
name varchar(20)
)
insert into testid
select n,'abc'
from numbers
where n<=1000000
create index nci_get on dbo.testid(id,name)
select id,name,dbo.getusername(id) from dbo.testid where id>4
below is the execution plan for above query
Decoding above plan:
Index seek outputs id,name
Then compute scalar tries to calculate new rows from existing row values.in this case expr1003 which is our function
Index seek cost is 97%,compute scalar cost is 3% and as you might be aware index seek is not an operator which goes to table to get data.so hopefully this clears your question
Related
There are 2 tables:
Table 1: first_names
id | first_name
1 | Joey
7 | Ross
17| Chandler
Table 2: last_names
id | first_name
2 | Tribbiani
7 | Geller
25| Bing
Desired result:
id | full_name
1 | Joey Tribbiani
2 | Ross Geller
3 | Chandler Bing
Task:
Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden.
I have solution using ROW_NUMBER() function, but no ideas about solving this task using only the simplest SQL syntax.
P.S. I'm only trainee and it's my first question on stackoverflow
Simple join will suffice here
select * from first_names fn
join last_names ln on fn.id = ln.id - 1
But your question is very unclear though. Because join here is based rather on knowledge about Friends series rather than concrete logic...
You must create an id to join the tables.
This can be the order number in the table based in ids:
select
f.counter id, concat(f.first_name, ' ', l.last_name) full_name
from (
select t.*, (select count(*) from first_names where id < t.id) + 1 counter
from first_names t
) f inner join (
select t.*, (select count(*) from last_names where id < t.id) + 1 counter
from last_names t
) l
on l.counter = f.counter
See the demo.
Results:
> id | full_name
> -: | :-------------
> 1 | Joey Tribbiani
> 2 | Ross Geller
> 3 | Chandler Bing
Honestly, this is a stupid solution; it's vastly inefficient to ROW_NUMBER, and I wouldn't be surprised if LEAD is "not allowed" as ROW_NUMBER isn't. The fact that you were told to "use the simpliest SQL" means that the SQL you want to use is a subquery/CTE and ROW_NUMBER; that is as simple as this can really go. Anything else add a layer on unneeded complexity and will likely just make the query suffer from performance degradation. This one, for example, means you need to scan both tables twice; where as with ROW_NUMBER it would be once.
CREATE TABLE FirstNames (id int, FirstName varchar(10));
CREATE TABLE LastNames (id int, LastName varchar(10));
INSERT INTO FirstNames
VALUES(1,'Joey'),
(7,'Ross'),
(17,'Chandler');
INSERT INTO LastNames
VALUES (2,'Tribbiani'),
(7,'Geller'),
(25,'Bing');
GO
WITH CTE AS(
SELECT FN.id,
FN.FirstName,
LN.LastName
FROM FirstNames FN
LEFT JOIN LastNames LN ON FN.id = LN.id
UNION ALL
SELECT LN.id,
FN.FirstName,
LN.LastName
FROM LastNames LN
LEFT JOIN FirstNames FN ON LN.id = FN.id
WHERE FN.id IS NULL),
FullNames AS(
SELECT C.id,
C.FirstName,
ISNULL(C.LastName, LEAD(C.LastName) OVER (ORDER BY id)) AS LastName
FROM CTE C)
SELECT *
FROM FullNames FN
WHERE FN.FirstName IS NOT NULL
ORDER BY FN.id;
GO
DROP TABLE FirstNames;
DROP TABLE LastNames;
To answer the "Task" given:
"Task: Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden."
My answer would be the below?
"Why is this a requirement? SQL Server has supported ROW_NUMBER for 14 years, since SQL Server 2005. If you can't use ROW_NUMBER this infers you're using SQL Server 2000. This is actually a big security problem for the company, as 2000 has been out of support for close to a decade. Legislation like GDPR require a company to keep the technology they use secure, and it is very unlikely that this is therefore being met.
If this is the case, the solution if not the find a way around using ROW_NUMBER but to get the company back up to do date. The latest version of SQL Server that you can upgrade to from SQL Server 2000 is 2008; which also runs out of support on July 16 of this year. We'll need to get an instance up and running and get the existing features into this new server ASAP and get QA testing done as soon as possible. This needs to be the highest priority thing. After that we need to repeat the cycle to another version of SQL Server. The latest is 2017, which does support migration from 2008.
Once we've done that, we can then actually make use of ROW_NUMBER in the query; providing the simplest solution and also bringing the company back into a secure environment."
Sometimes requirements need to be challenged. From experience management can make some "stupid" requirements, because they don't understand the technology. When you're in an IT role, sometimes you will need to question those requirements and explain why the requirement isn't actually a good idea. Then, instead, you can aid Management to find the correct solution for the problem. At the end of the day, what they might be trying to fix could be an XY problem; and part of your troubleshooting will be to find out what X really is.
What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.
I would like to create one location on sql server where I store the report-date and all queries and procedures should relate to this one value.
In that way I only have to change the report date on one location and it is valid for all related queries and procedures.
I started with a scalar function that retrieves a value from a table, but this slows down the queries enomoursly.
I tried an inline table valued function, but have no idea how to include this into a query.
I tried with a table that contains the report-date and used a cross join.
But it says:
The multi-part identifier could not be bound
Maybe some of you have an idea what to do here?
One possibility is to create a table, let's say TblReportDate with two columns: id and reportDate.
Then add one row with id 1 like following:
+----+------------+
| id | reportDate |
+----+------------+
| 1 | 04.04.2018 |
+----+------------+
Now join the table with a LEFT JOIN and use the >= operator to compare with the id-column of the main-table:
SELECT * FROM mainTable
LEFT JOIN TblReportDate ON mainTable.id >= TblReportDate.id
I have the follwing given two tables which can not be changed.
1: DataTypes
+----------------------+-----------------------+
| datatypename(String) | datatypetable(String) |
+----------------------+-----------------------+
Example data:
+-----------+------------+
| CycleTime | datalong |
+-----------+------------+
| InjTime1 | datadouble |
+-----------+------------+
2: datalong_1 (data model does not matter here)
I want to make a query now that reads the datatypetable attribute from the datatypes table, adds the String "_1" to it and selects all content from it.
I imagined it, from a programmatic perspective, to look something similar to this statement which obviously doesn't work yet:
SELECT * FROM
(SELECT datatypetable FROM datatypes WHERE datatypename = 'CycleTime') + '_1'
How can I make this happen in SQL using HSQLDB?
Thanks to Leonidas199x I know now how to get in the '_1' in but how do I tell the FROM statement that the subselect is not a new table I want to read from but instead the name of an existing table I want to read from.
SELECT * FROM
(SELECT RTRIM(datatypetable)+'_1' FROM datatypes WHERE datatypename = 'CycleTime')
According to this question which is identical to mine this is not possible:
using subquery instead of the tablename
:(
Can you explain your data model in a little more detail? I am not sure I understand exactly what it is you are looking to do.
If you are wanting to add _1 to the 'datatypename', you can use:
SELECT datatypename+'_1'
FROM datatypes
CREATE TABLE `names` ( `name` varchar(20) );
Assume the names table contains all 40 million first names of everyone living in California (for example).
SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;
How can I optimize this query?
Expected Result:
count | name
9999 | joe
9995 | mike
9990 | kate
.... | ....
2 | kal-el
You have to create an index on the name column of your table. The query is as good as it can be.
Well, what makes you think it's not already optimised? This looks like the sort of query a good database engine should be able to handle relatively easily - particularly if you've got an appropriate index on your table.
Do you actually have a bottleneck here, or are you worrying about something that might happen in the future? If it's the latter, I suggest you try it with your RDBMS (by generating dummy data), and see what happens.