How to join 2 tables without common fields? - sql

There are 2 tables:
Table 1: first_names
id | first_name
1 | Joey
7 | Ross
17| Chandler
Table 2: last_names
id | first_name
2 | Tribbiani
7 | Geller
25| Bing
Desired result:
id | full_name
1 | Joey Tribbiani
2 | Ross Geller
3 | Chandler Bing
Task:
Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden.
I have solution using ROW_NUMBER() function, but no ideas about solving this task using only the simplest SQL syntax.
P.S. I'm only trainee and it's my first question on stackoverflow

Simple join will suffice here
select * from first_names fn
join last_names ln on fn.id = ln.id - 1
But your question is very unclear though. Because join here is based rather on knowledge about Friends series rather than concrete logic...

You must create an id to join the tables.
This can be the order number in the table based in ids:
select
f.counter id, concat(f.first_name, ' ', l.last_name) full_name
from (
select t.*, (select count(*) from first_names where id < t.id) + 1 counter
from first_names t
) f inner join (
select t.*, (select count(*) from last_names where id < t.id) + 1 counter
from last_names t
) l
on l.counter = f.counter
See the demo.
Results:
> id | full_name
> -: | :-------------
> 1 | Joey Tribbiani
> 2 | Ross Geller
> 3 | Chandler Bing

Honestly, this is a stupid solution; it's vastly inefficient to ROW_NUMBER, and I wouldn't be surprised if LEAD is "not allowed" as ROW_NUMBER isn't. The fact that you were told to "use the simpliest SQL" means that the SQL you want to use is a subquery/CTE and ROW_NUMBER; that is as simple as this can really go. Anything else add a layer on unneeded complexity and will likely just make the query suffer from performance degradation. This one, for example, means you need to scan both tables twice; where as with ROW_NUMBER it would be once.
CREATE TABLE FirstNames (id int, FirstName varchar(10));
CREATE TABLE LastNames (id int, LastName varchar(10));
INSERT INTO FirstNames
VALUES(1,'Joey'),
(7,'Ross'),
(17,'Chandler');
INSERT INTO LastNames
VALUES (2,'Tribbiani'),
(7,'Geller'),
(25,'Bing');
GO
WITH CTE AS(
SELECT FN.id,
FN.FirstName,
LN.LastName
FROM FirstNames FN
LEFT JOIN LastNames LN ON FN.id = LN.id
UNION ALL
SELECT LN.id,
FN.FirstName,
LN.LastName
FROM LastNames LN
LEFT JOIN FirstNames FN ON LN.id = FN.id
WHERE FN.id IS NULL),
FullNames AS(
SELECT C.id,
C.FirstName,
ISNULL(C.LastName, LEAD(C.LastName) OVER (ORDER BY id)) AS LastName
FROM CTE C)
SELECT *
FROM FullNames FN
WHERE FN.FirstName IS NOT NULL
ORDER BY FN.id;
GO
DROP TABLE FirstNames;
DROP TABLE LastNames;
To answer the "Task" given:
"Task: Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden."
My answer would be the below?
"Why is this a requirement? SQL Server has supported ROW_NUMBER for 14 years, since SQL Server 2005. If you can't use ROW_NUMBER this infers you're using SQL Server 2000. This is actually a big security problem for the company, as 2000 has been out of support for close to a decade. Legislation like GDPR require a company to keep the technology they use secure, and it is very unlikely that this is therefore being met.
If this is the case, the solution if not the find a way around using ROW_NUMBER but to get the company back up to do date. The latest version of SQL Server that you can upgrade to from SQL Server 2000 is 2008; which also runs out of support on July 16 of this year. We'll need to get an instance up and running and get the existing features into this new server ASAP and get QA testing done as soon as possible. This needs to be the highest priority thing. After that we need to repeat the cycle to another version of SQL Server. The latest is 2017, which does support migration from 2008.
Once we've done that, we can then actually make use of ROW_NUMBER in the query; providing the simplest solution and also bringing the company back into a secure environment."
Sometimes requirements need to be challenged. From experience management can make some "stupid" requirements, because they don't understand the technology. When you're in an IT role, sometimes you will need to question those requirements and explain why the requirement isn't actually a good idea. Then, instead, you can aid Management to find the correct solution for the problem. At the end of the day, what they might be trying to fix could be an XY problem; and part of your troubleshooting will be to find out what X really is.

Related

SQL: Taking one column from two tables and putting them into one predefined table

Just a little bug off my shoulder, but for what I'm using this code for, it is not the end of the world if this one doesn't get answered. To preface, a few things: I know this is entirely improper, I know this should never be used -- let alone, done -- in a production environment, and I know that the root of this operation is totally unconventional, but I'm asking anyway:
If I have two tables with a set of values that I am looking to grab and put into one other, combined and predefined table, side by side, how might I do that?
Right now, I have two statements doing
INSERT INTO table ('leftCol') SELECT NAME FROM smolT1 ORDER BY num DESC LIMIT 3
INSERT INTO table ('rightCol') SELECT NAME FROM smolT2 ORDER BY num DESC LIMIT 3
but, as one would imagine, that query ends up with something like...
leftCol | rightCol
Jack |
James |
John |
| Jill
| Justina
| Jesebelle
and of course, it would be much more preferred if the left and right column lined up, though, for the sake of gathering just those six records, I suppose it is not too big of a concern.
To add on, yes, these two tables do have a NAME in common, but with how I am querying them, they are totally irrelevant one another and should not be associated with one another, just displayed side by side.
I am simply curious as to whether or not one query would get these two unrelated queries to work together and print neatly into a form or if I just have to live with this data looking like this.
Cheers!
The most recent versions of SQLite support window functions. This allows you to do:
select min(name1) as name1, min(name2) as name2
from ((select name as name1, null as num2 row_number() over (order by name) as seqnum
from smolt1
where name is not null
) union all
(select null, name, row_number() over (order by name) as seqnum
from smolt2
where name is not null
)
) lr
group by seqnum;

sql select record with lowest value of the two

Despite my internet searching, I've not found a solution to what I think is a simple SQL problem.
I have a simple table as such:
zip | location | transit
------------------------
10001 | 1 | 5
10001 | 2 | 2
This table of course has a large number of zip codes, but I'd like to make s simple query by zip code and instead of returning all rows with the zip, return only a single row (with all 3 columns), that contains the lowest transit value.
I've been playing with the aggregate function min(), but haven't gotten it right.
Using Postgres SQL DB 9.6
Thanks!
Use ORDER BY along with LIMIT :
SELECT t.*
FROM mytable t
WHERE t.zipcode = ?
ORDER BY t.transit
LIMIT 1
How about
select * from table where zip = ‘10001’ order by transit limit 1
I would use distinct on:
select distinct on (zip) t.*
from t
order by zip, transit;
This is usually the most efficient method in Postgres, particularly with an index on (zip, transit).
Of course if you have only one zip code that you care about, then where/order by/limit is also totally reasonable.
Assuming that you also want to return the location value associated with the minimum transit value, then here is one possible solution using an inner join:
select t.*
from
yourtable t inner join
(select u.zip, min(u.transit) as mt from yourtable u group by u.zip) v
on t.zip = v.zip and t.transit = v.mt
Change all references to yourtable to the name of your table.

How do I use variables in a select query?

I have this following select query that uses a scalar function to get full name. I want to eliminate the redundancy by using variable but so far there is no success. My query is
select
a.Id,
a.UserName,
getFullName(a.UserName),
a.CreateTime
from DataTable;
I don't want to retrieve 'a.User' two times. I would prefer if I can save a.User in a variable and then pass it to the function hence improving the efficiency.
Currently the work around I came up with is as following
select
Id,
UserName,
getFullName(UserName),
CreateTime
from (select a.Id, a.UserName, a.CreateTime from DataTable) temp
This solves the performance issue but adds the overhead to write same select two time. Any other suggestions would be great.
DataTable looks like this
+----+----------+------------+
| Id | UserName | CreateTime |
+----+----------+------------+
| 1 | ab | 10:00 |
| 2 | cd | 11:00 |
| 3 | ef | 12:00 |
+----+----------+------------+
Here is the NamesTable used to get the full names
+----------+----------+
| UserName | FullName |
+----------+----------+
| ab | Aa BB |
| cd | Cc Dd |
| ef | Ee Ff |
+----------+----------+
Here is the function that gets the full name
Create function [dbo].[getFullName](#user varchar(150)) returns varchar(500)
as
begin
declare #Result varchar(500);
select #Result = FullName from dbo.NamesTable where UserName = #user;
return #Result;
end;
You're solving a problem that doesn't exist. You seem to think that
select
a.Id,
a.UserName,
getFullName(a.UserName),
a.CreateTime
from DataTable;
Has some relatively expensive process behind it to get UserName that is happening twice. In reality, once the record is located, getting the UserName value is an virtually instant process since it will probably be stored in a "variable" by the SQL engine behind the scenes. You should have little to no performance difference between that query and
select
a.Id,
getFullName(a.UserName),
a.CreateTime
from DataTable;
The scalar function itself may have a performance issue, but it's not because you are "pulling" the UserName value "twice".
A better method would be to join to the other table:
select
a.Id,
a.UserName,
b.FullName,
a.CreateTime
from DataTable a
LEFT JOIN dbo.NamesTable b
ON a.UserName = b.UserName
As D Stanley says, you're trying to solve some problem that doesn't exist. I would further add that you shouldn't be using the function at all. SQL is meant to perform set-based operations. When you use a function like that you're now making it perform the same function over and over again for every row - a horrible practice. Instead, just JOIN in the other table (a set-based operation) and let SQL do what it does best:
SELECT
DT.Id,
DT.UserName,
NT.fullname,
DT.CreateTime
FROM
DataTable DT
INNER JOIN NamesTable NT ON NT.username = DT.username;
Also, DataTable and NamesTable are terrible names for tables. Of course they're tables, so there's no need to put "table" on the end of the name. Further, of course the first one holds "data", it's a database. Your table names should be descriptive. What exactly does DataTable hold?
If you're going to be doing SQL development in the future then I strongly suggest that you read several introductory books on the subject and watch as many tutorial videos as you can find.
Scalar UDF will execute for every row,but not defintely the way you think.below is sample demo and execution plan which proves the same..
create table testid
(
id int,
name varchar(20)
)
insert into testid
select n,'abc'
from numbers
where n<=1000000
create index nci_get on dbo.testid(id,name)
select id,name,dbo.getusername(id) from dbo.testid where id>4
below is the execution plan for above query
Decoding above plan:
Index seek outputs id,name
Then compute scalar tries to calculate new rows from existing row values.in this case expr1003 which is our function
Index seek cost is 97%,compute scalar cost is 3% and as you might be aware index seek is not an operator which goes to table to get data.so hopefully this clears your question

Best practice for setup and querying versioned records in T-SQL

I'm trying to optimize my SQL queries and I always come back to this one issue and I was hoping to get some insight into how I could best optimize this.
For brevity, lets say I have a simple employee table:
tbl_employees
Id HiredDateTime
------------------
1 ...
2 ...
That has versioned information in another another table for each employee:
tbl_emplyees_versioned
Id Version Name HourlyWage
-------------------------------
1 1 Bob 10
1 2 Bob 20
1 3 Bob 30
2 1 Dan 10
2 2 Dan 20
And this is how the latest version records are retrieved in a View:
Select tbl_employees.Id, employees_LatestVersion.Name, employees_LatestVersion.HourlyWage, employees_LatestVersion.Version
From tbl_employees
Inner Join tbl_employees_versioned
ON tbl_employees.Id = tbl_employees_versioned.Id
CROSS APPLY
(SELECT Id, Max(Version) AS Version
FROM tbl_employees_versioned AS employees_LatestVersion
WHERE Id = tbl_employees_versioned.Id
GROUP BY Id) AS employees_LatestVersion
To get a response like this:
Id Version Name HourlyWage
-------------------------------
1 3 Bob 30
2 2 Dan 20
When pulling a query that has over 500 employees records for which each have a couple few versions, this query starts choking up and takes a few seconds to run.
There are a couple strikes right off the bat, but I'm not sure how to overcome them.
Obviously the Cross Apply adds some performance loss. Is there a best practice when dealing with versioned information like this? Is there a better way to get just a record with the highest version?
The versioned table doesn't have a clustered index beause neither Id or Version are unique. Concatenated together they would be, but it doesn't work like that. Instead there is a non-clustered index for Id and another one for Version. Is there a better way to index this table to get any performance gain? Would an indexed view really help here?
I think the best way to structure the data is using start dates and end dates. So, the data structure for your original table would look like:
create table tbl_EmployeesHistory (
EmployeeHistoryId int,
EffDate date not null,
EndDate date,
-- Fields that describe the employee during this time
)
Then, you can see the current version using a view:
create view vw_Employees as
select *
from tbl_EmployeesHistory
where EndDate is NULL
In some cases, where future end dates are allowed, the where clause would be:
where coalesce(EndDate, getdate()) >= getdate()
Alternatively, in this case, you can default EndDate to some future date far, far away such as '01-o1-9999'. You would add this as the default in the create table statement, make the column not null, and then you can always use the statement:
where getdate() between EffDate and EndDate
As Martin points out in his comment, the coalesce() might impede the use of an index (it does in SQL Server), whereas this does not have that problem.
This is called a slowly changing dimension. Ralph Kimball discusses this concept in some length in his books on data warehousing.
Here's one way you can get a view of the most recent version for each employee:
Select Id, Name, HourlyWage, Version
FROM (
Select E.Id, V.Name, V.HourlyWage, V.Version,
row_number() OVER (PARTITION BY V.ID ORDER BY V.Version DESC) as nRow
From tbl_employees E
Inner Join tbl_employees_versioned V ON E.Id = V.Id
) A
WHERE A.nRow = 1
I suspect that this will perform better than your previous solution. One index across Id and Version in tbl_employees_versioned would most likely also help.
Also, note that you only need to join on tbl_employees if you're selecting fields that are not in tbl_employees_versioned.

Loop through without Cursor in SQL Server 2005

I have a table OrganisationStructure like this:
OrganisationID INT
ParentOrganisationID INT
OrganisationName VARCHAR(64)
1 | 0 | Company
2 | 1 | IT DIVISION
3 | 2 | IT SYSTEM BUSINESS UNIT
4 | 1 | MARKETING DIVISION
5 | 4 | DONATION BUSINESS UNIT
I want to have a query that if the app passing let say OrganisatinID = 1 means that it will loop (looking at parent/child) through till end of this table and grap all possible Returned OrganisatioIDs = (1, 2, 3, 4, 5).
Other if passing OrganisationID = 2 then Returned OrganisationID = (2, 3)
Other if passing OrganisationID = 3 then Returned OrganisationID = 3
Any ideas to do this without cursor?
Thanks
You can use SQL 2005 CTEs to make the SQL engine do it recursively.
An enumeration of basic approaches is at http://blogs.msdn.com/anthonybloesch/archive/2006/02/15/Hierarchies-in-SQL-Server-2005.aspx
Celko also has a trees in SQL book which covers all of this to the nth degree.
Or you can brute force it by selecting each level into a local table variable and then looping, inserting children with a select, until your ##ROWCOUNT is zero (i.e., you're not finding any more children). If you don't have a lot of data, this is easy to code, but you hinted that you're looking for performance by saying you dont want a cursor.
declare #rootID int;
select #rootID = 4;
with cte_anchor as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM Organisation
WHERE OrganisationID = #rootID)
, cte_recursive as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM cte_anchor
UNION ALL
SELECT o.OrganisationID
, o.ParentOrganisationID
, o.OrganisationName
FROM Organisation o JOIN cte_recursive r
ON o.ParentOrganisationID = r.OrganisationID)
SELECT * FROM cte_recursive
In SqlServer 2005 with Common Table Expressions is possible to do recursive queries. For an example see 'Recursive Common Table Expressions' in Common Table Expressions (CTE) in SQL Server 2005 from 4guysfromrolla.
How many levels deep can your parent child structure go ?
You could do a self-join on the table to line up grand-parent / parent / child entities, but that's limited by the number of levels deep your parent/child relationships can go.
I know you've stated SQL 2005 but just so you're aware this kind of tree structure mapping is exactly what the new HierarchyID (Video Here) in Sql 2008 is for.
Try this for 3 levels using plain vanilla simple brute force - you can add levels as required.
SELECT DISTINCT OrganizationID
FROM
(
SELECT
ParentOrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT
OrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT os2.OrganizationID
FROM OrganizationStructure os
JOIN OrganizationStructure os2 ON os.OrganizationID = is2.ParentOrganizationID
WHERE os.ParentOrganizationID = #arg
) data
I believe the question is answered well enough, however if you're interested in alternative methods of structuring your data for better effect, google for 'evolt ways to work with hierarchical data'
I'm not allowed to post links yet :)