I'm working with a Pro*C query, but this question should be general SQL. My research has been a dead end, but I think I'm missing something.
Suppose my server has an array of students' names, {"Alice","Charlie","Bob"}. I query the Student_Ids table for the students' ID numbers:
SELECT id_no FROM student_ids
WHERE student_name IN ('Alice','Charlie','Bob');
To simplify server-side processing, I want to sort the result set in the same order as the students' names. In other words, the result set would be {alice_id_no, charlie_id_no, bob_id_no} regardless of the actual ordering of the table or the behavior of any particular vendor's implementation.
The only solution I can think of is:
. . .
ORDER BY
CASE WHEN student_name='Alice' THEN 0
WHEN student_name='Charlie' THEN 1
WHEN student_name='Bob' THEN 2 END;
but that seems extremely messy and cumbersome when trying to dynamically generate/run this query.
Is there a better way?
UPDATE I gave a terrible example by pre-sorting the students' names. I changed the names to be deliberately unsorted. In other words, I want to sort the names in a non-ASC or DESC-friendly way.
UPDATE II Oracle, but for knowledge's sake, I am looking for more general solutions as well.
The ORDER BY expression you've given for your sample data is equivalent to ORDER BY student_name. Is that what you intended?
If you want a custom ordering that is not alphabetical, I think you might have meant something like this:
ORDER BY
CASE
WHEN student_name = 'Alice' THEN 0
WHEN student_name = 'Charlie' THEN 1
WHEN student_name = 'Bob' THEN 2
END;
You can use a derived table as well, that holds the names as well as the ordering you want. This way you only have to put the names in a single time:
SELECT S.id_no
FROM
student_ids AS S
INNER JOIN (
SELECT Name = 'Alice', Seq = 0 FROM DUAL
UNION ALL SELECT 'Bob', 2 FROM DUAL
UNION ALL SELECT 'Charlie', 1 FROM DUAL
) AS N
ON S.student_name = N.Name
ORDER BY
N.Seq;
You could also put them into a temp table, but in Oracle that could be somewhat of a pain.
Can you do this?
order by student_name
To do a custom sort, you only need one case:
ORDER BY (CASE WHEN student_name = 'Alice' THEN 1
WHEN student_name = 'Bob' THEN 2
WHEN student_name = 'Charlie' THEN 3
ELSE 4
END)
why not this :
SELECT id_no FROM student_ids
WHERE student_name IN ('Alice','Bob','Charlie')
ORDER BY student_name
You can ORDER BY any columns, not necessary those in SELECT list or WHERE clause
SELECT id_no
FROM student_ids
WHERE student_name IN ('Alice','Bob','Charlie)
ORDER BY id_no;
Add a table to hold the sort priorities then you can use the sort_priorities in whatever query you want (and easily update the priorities):
CREATE TABLE student_name_sort_priorities (
student_name VARCHAR2(30) CONSTRAINT student_name_sort_priority__pk PRIMARY KEY,
sort_priority NUMBER(10) CONSTRAINT student_name_sort_priority__nn NOT NULL
CONSTRAINT student_name_sort_priority__u UNIQUE
);
(If you want two values to be equivalently sorted then don't include the UNIQUE constraint.)
INSERT INTO student_name_sort_priorities VALUES ( 'Alice', 0 );
INSERT INTO student_name_sort_priorities VALUES ( 'Charlie', 2 );
INSERT INTO student_name_sort_priorities VALUES ( 'Bob', 1 );
Then you can join the sort priority table with the student_ids table and use the extra column to perform ordering:
SELECT id_no
FROM student_ids s
LEFT OUTER JOIN
student_name_sort_priorities p
ON (s.student_name = p.student_name)
ORDER BY
sort_priority NULLS LAST;
I've used a LEFT OUTER JOIN so that if a name is not contained on the student_name_sort_priorities table then it does not restrict the rows returned from the query; NULLS LAST is used in the ordering for a similar reason - any student names that aren't in the sort priorities table will return a NULL sort priority and be placed at the end of the ordering. If you don't want this then just use INNER JOIN and remove the NULLS LAST.
How about using a 'table of varchar' type like the build-in below:
TYPE dbms_debug_vc2coll is table of varchar2(1000);
test:
SQL> select customer_id, cust_first_name, cust_last_name from customers where cust_first_name in
2 (select column_value from table(sys.dbms_debug_vc2coll('Frederic','Markus','Dieter')));
CUSTOMER_ID CUST_FIRST_NAME CUST_LAST_NAME
----------- -------------------- --------------------
154 Frederic Grodin
149 Markus Rampling
152 Dieter Matthau
That seems to force the order, but that might just be bad luck. I'm not really a sql expert.
The execution plan for this uses 'collection iterator' instead of a big 'or' in the typical:
select customer_id, cust_first_name, cust_last_name from customers where cust_first_name in ('Frederic','Markus','Dieter');
hth, Hein.
Related
I have Two tables that contain employees.
One table has all active employees (Current_Employees) and the other one has employees that have been joined in the last month (Greenhouse_Employees). Sometimes these employees overlap.
I have a unqiue ID (position_ID) that I want to do the following with in the abstract:
If Greenhouse_Employees unique ID exists in or matches to an ID in Current_Employees, ignore, if it does not: append it and its associated columns to the table
Not all of the columns match in either table, some do.
The code below almost works, but if there is a single inconsistency any any column I coalesce, it duplicates the row: (Some employees have inconsistent [loc] (locations) in the tables due to data entry error
SELECT
COALESCE(#CURRENT_EMPLOYEES.[employee_status], [GREENHOUSE_TABLE].[job_status]) AS employment_status
,COALESCE(#CURRENT_EMPLOYEES.[employee_id], 'N/A') AS employee_id
,COALESCE(#CURRENT_EMPLOYEES.[employee_name], CONCAT([GREENHOUSE_TABLE].[first_name],' ',[GREENHOUSE_TABLE].[last_name])) AS employee_name
,COALESCE(#CURRENT_EMPLOYEES.[hire_date], [GREENHOUSE_TABLE].[hire_date]) AS hire_date
,COALESCE(#CURRENT_EMPLOYEES.[salary], [GREENHOUSE_TABLE].[salary]) AS salary
,COALESCE(#CURRENT_EMPLOYEES.[bonus_percent], [GREENHOUSE_TABLE].[annual_bonus]) AS bonus_percent
,COALESCE(#CURRENT_EMPLOYEES.[commission_percent], '0') AS commission_percent
,COALESCE(#CURRENT_EMPLOYEES.[currency], 'N/A') AS currency
,COALESCE(#CURRENT_EMPLOYEES.[company_title], [GREENHOUSE_TABLE].[company_title]) AS company_title
,COALESCE(#CURRENT_EMPLOYEES.[company_department], [GREENHOUSE_TABLE].[company_department]) AS company_department
,COALESCE(#CURRENT_EMPLOYEES.[country], 'N/A') AS country
,COALESCE(#CURRENT_EMPLOYEES.[loc], 'N/A') AS loc
,COALESCE(#CURRENT_EMPLOYEES.[job_level], [GREENHOUSE_TABLE].[job_level]) AS job_level
,COALESCE(#CURRENT_EMPLOYEES.[kamsa_code], [GREENHOUSE_TABLE].[kamsa_code]) AS kamsa_code
,COALESCE(#CURRENT_EMPLOYEES.[position_id],[GREENHOUSE_TABLE].[position_id]) AS position_id
FROM #CURRENT_EMPLOYEES
FULL JOIN [Headcount].[dbo].[greenhouse_employees] AS GREENHOUSE_TABLE ON #CURRENT_EMPLOYEES.[position_id] = [GREENHOUSE_TABLE].[position_id]
ORDER BY #CURRENT_EMPLOYEES.[hire_date] ASC
I believe you need to You need to INSERT ... SELECT ... WHERE NOT EXISTS(...) or INSERT ... SELECT ... WHERE <Id> NOT IN (...). Something like:
INSERT #CURRENT_EMPLOYEES (employee_status, employee_id, ...)
SELECT employee_status, employee_id, ...
FROM Headcount.dbo.greenhouse_employees GE
WHERE NOT EXISTS (
SELECT *
FROM #CURRENT_EMPLOYEES CE
WHERE CE.employee_id = GE.employee_id
)
The other form is
INSERT #CURRENT_EMPLOYEES (employee_status, employee_id, ...)
SELECT employee_status, employee_id, ...
FROM Headcount.dbo.greenhouse_employees GE
WHERE GE.employee_id NOT IN (
SELECT CE.employee_id
FROM #CURRENT_EMPLOYEES CE
)
Both assume that employee_id is unique in greenhouse_employees.
I feel a little stupid asking this because I feel like this is very easy, but for some reason I'm not able to update a query to not select a specific item based on two criteria.
Let's say I have data like this:
ID Name Variant Count1
110 Bob Type1 0
110 Bob Type2 1
120 John Type1 1
So as you can see we have two BOB rows with same ID but different variant (type1 and type2). I want to be able to only see one of the Bob's.
Desired result:
110 Bob Type2
120 John Type1
So what I've been doing is something like
Select ID, Name, Variant, sum(count1) from tbl1
where (id not in (110) and Variant <> 'type1')
Group by Id,name,variant
Please don't use COUNT as a criteria, because in my example it just so happens that Count=0 for the row that I don't want to see. It can vary.
I have many rows where I can have multiple instances of the same id with a variety of different VARIANTS. I'm looking to exclude certain instances of ID based on Variant value
UPDATE:
It has nothing to do with latest variant, it has to do with a specific variant. So I'm just looking to basically be able to use a clause where i used the ID and VARIANT, in order to remove that particular row.
Aggregating (grouping) the data like you're doing is one way to do it, although the where condition is a little overkill. If all you want to do is see the unique combinations of ID and Name, then another approach is just to use the "distinct" statement.
select distinct Id, Name
from tbl1
If you always want to see data from a specific Variant then just include that condition in your where clause and you don't need to worry about using distinct or aggregates.
select *
from tbl1
where Variant = 'Type 1'
If you always want to see the record associated with the latest Variant, then you can use a window function to do so.
select a.Id, a.Name, a.Variant
from
(
select *, row_number() over (partition by Id order by Variant desc) as RowRank
from tbl1
) a
where RowRank = 1
;
If there is not a predictable pattern for exclusion then you will have to maintain an exclusion list. It's not ideal but if you want to maintain this in the SQL itself then you could have a query like the one below.
select *
from tbl1
-- Define rows to exlcude
where not (Id = 110 and Variant = 'Type 1') -- Your example
and not (Id = 110 and Variant = 'Type 3') -- Theoretical example
;
A better solution would be to create an exclusion reference table to maintain all exclusions within. Then you could simply negative join to that table to retrieve your desired results.
Have you considered using an exclusion table where you can place the ID and Variant combinations that you want to exclude? ( I just used temp tables for this example, you can always use user tables so your exclusion table will always be available)
Here is an example of what I mean based on your example:
if object_id('tempdb..#temp') is not null
drop table #temp
create table #temp (
ID int,
Name varchar(20),
Variant varchar(20),
Count1 int
)
if object_id('tempdb..#tempExclude') is not null
drop table #tempExclude
create table #tempExclude (
ID int,
Variant varchar(20)
)
insert into #temp values
(110,'Bob','Type1',0),
(110,'Bob','Type2',1),
(120,'John','Type1',1),
(120,'John','Type2',1),
(120,'John','Type2',1),
(120,'John','Type2',1),
(120,'John','Type3',1)
insert into #tempExclude values (110,'Type1')
select
t.ID,
t.Name
,t.Variant
,sum(t.Count1) as TotalCount
from
#temp t
left join
#tempExclude te
on t.ID = te.ID
and t.Variant = te.Variant
where
te.id is null
group by
t.ID,
t.Name
,t.Variant
Here are the results:
I think the logic you want is something like:
Select ID, Name, Variant, sum(count1)
from tbl1
where not (id = 110 and variant = 'type1')
Group by Id, name, variant;
For the second condition, just keep adding:
where not (id = 110 and variant = 'type1') and
not (id = 314 and variant = 'popsicle')
You can also express this using a list of exclusions:
select t.ID, Name, t.Variant, sum(t.count1)
from tbl1 t left join
(values (111, 'type1'),
(314, 'popsicle')
) v(id, excluded_variant)
on t.id = v.id and
t.variant = v.excluded_variant
where v.id is not null -- doesn't match an exclusion criterion
group by Id, name, variant;
I have two tables like the following. One is for sport talents of some people and second for arts talents. One may not have a sport talent to list and same applies for art talent.
CREATE TABLE SPORT_TALENT(name varchar(10), TALENT varchar(10));
CREATE TABLE ART_TALENT(name varchar(10), TALENT varchar(10));
INSERT INTO SPORT_TALENT(name, TALENT) VALUES
('Steve', 'Footbal')
,('Steve', 'Golf')
,('Bob' , 'Golf')
,('Mary' , 'Tennnis');
INSERT INTO ART_TALENT(name, TALENT) VALUES
('Steve', 'Dancer')
, ('Steve', 'Singer')
, ('Bob' , 'Dancer')
, ('Bob' , 'Singer')
, ('John' , 'Dancer');
Now I want to list down sport talent and art talent of one person. I would like to avoid duplication. But I don't mind if there is a "null" in any output. I tried the following
select distinct sport_talent.talent as s_talent,art_talent.talent as a_talent
from sport_talent
JOIN art_talent on sport_talent.name=art_talent.name
where (sport_talent.name='Steve' or art_talent.name='Steve');
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
Footbal | Singer
Golf | Dancer
I would like to avoid redundancy and need something like the following (distinct values of sport talents + distinct values of art talents).
s_talent | a_talent
----------+----------
Footbal | Dancer
Golf | Singer
As mentioned in subject, I am not looking for distinct combinations. But at the same time, it's OK if there are some records with "null" value in one column. I am relatively new to SQL.
Try:
SELECT s_talent, a_talent
FROM (
SELECT distinct on (talent) talent as s_talent,
dense_rank() over (order by talent) as x
FROM SPORT_TALENT
WHERE name='Steve'
) x
FULL OUTER JOIN (
SELECT distinct on (talent) talent as a_talent,
dense_rank() over (order by talent) as x
FROM ART_TALENT
WHERE name='Steve'
) y
ON x.x = y.x
Demo: http://sqlfiddle.com/#!15/66e04/3
There are no duplicates in your query. Each of the four records in your query return is unique. This result may not be what you want, but seems like its problem is not the duplicate.
Postgres 9.4
... introduces unnest() with multiple arguments. Does exactly what you want, and should be fast, too. Per documentation:
The special table function UNNEST may be called with any number of
array parameters, and it returns a corresponding number of columns, as
if UNNEST (Section 9.18) had been called on each parameter separately
and combined using the ROWS FROM construct.
About ROWS FROM:
Compare result of two table functions using one column from each
SELECT *
FROM unnest(
ARRAY(SELECT DISTINCT talent FROM sport_talent WHERE name = 'Steve')
, ARRAY(SELECT DISTINCT talent FROM art_talent WHERE name = 'Steve')
) AS t(s_talent, a_talent);
Postgres 9.3 or older
SELECT s_talent, a_talent
FROM (
SELECT talent AS s_talent, row_number() OVER () AS rn
FROM sport_talent
WHERE name = 'Steve'
GROUP BY 1
) s
FULL JOIN (
SELECT talent AS a_talent, row_number() OVER () AS rn
FROM art_talent
WHERE name = 'Steve'
GROUP BY 1
) a USING (rn);
Similar previous answers with more explanation:
What type of JOIN to use
Sort columns independently, such that all nulls are last per column
This is similar to what #kordirko posted, but uses GROUP BY to get distinct talents, which is evaluated before window functions. So we only need a bare row_number() and not the more expensive dense_rank().
About the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
SQL Fiddle.
I have a query against a large number of big tables (rows and columns) with a number of joins, however one of tables has some duplicate rows of data causing issues for my query. Since this is a read only realtime feed from another department I can't fix that data, however I am trying to prevent issues in my query from it.
Given that, I need to add this crap data as a left join to my good query. The data set looks like:
IDNo FirstName LastName ...
-------------------------------------------
uqx bob smith
abc john willis
ABC john willis
aBc john willis
WTF jeff bridges
sss bill doe
ere sally abby
wtf jeff bridges
...
(about 2 dozen columns, and 100K rows)
My first instinct was to perform a distinct gave me about 80K rows:
SELECT DISTINCT P.IDNo
FROM people P
But when I try the following, I get all the rows back:
SELECT DISTINCT P.*
FROM people P
OR
SELECT
DISTINCT(P.IDNo) AS IDNoUnq
,P.FirstName
,P.LastName
...etc.
FROM people P
I then thought I would do a FIRST() aggregate function on all the columns, however that feels wrong too. Syntactically am I doing something wrong here?
Update:
Just wanted to note: These records are duplicates based on a non-key / non-indexed field of ID listed above. The ID is a text field which although has the same value, it is a different case than the other data causing the issue.
distinct is not a function. It always operates on all columns of the select list.
Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:
select ...
from (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) t
where rn = 1;
Using the order by clause you can select which of the duplicates you want to pick.
The above can be used in a left join, see below:
select ...
from x
left join (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) p on p.idno = x.idno and p.rn = 1
where ...
Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.
SELECT *
FROM People p
WHERE PeopleID = (
SELECT MIN(PeopleID)
FROM People
WHERE IDNo = p.IDNo
)
After careful consideration this dillema has a few different solutions:
Aggregate Everything
Use an aggregate on each column to get the biggest or smallest field value. This is what I am doing since it takes 2 partially filled out records and "merges" the data.
http://sqlfiddle.com/#!3/59cde/1
SELECT
UPPER(IDNo) AS user_id
, MAX(FirstName) AS name_first
, MAX(LastName) AS name_last
, MAX(entry) AS row_num
FROM people P
GROUP BY
IDNo
Get First (or Last record)
http://sqlfiddle.com/#!3/59cde/23
-- ------------------------------------------------------
-- Notes
-- entry: Auto-Number primary key some sort of unique PK is required for this method
-- IDNo: Should be primary key in feed, but is not, we are making an upper case version
-- This gets the first entry to get last entry, change MIN() to MAX()
-- ------------------------------------------------------
SELECT
PC.user_id
,PData.FirstName
,PData.LastName
,PData.entry
FROM (
SELECT
P2.user_id
,MIN(P2.entry) AS rownum
FROM (
SELECT
UPPER(P.IDNo) AS user_id
, P.entry
FROM people P
) AS P2
GROUP BY
P2.user_id
) AS PC
LEFT JOIN people PData
ON PData.entry = PC.rownum
ORDER BY
PData.entry
Use Cross Apply or Outer Apply, this way you can limit the amount of data to be joined from the table with the duplicates to the first hit.
Select
x.*,
c.*
from
x
Cross Apply
(
Select
Top (1)
IDNo,
FirstName,
LastName,
....,
from
people As p
where
p.idno = x.idno
Order By
p.idno //unnecessary if you don't need a specific match based on order
) As c
Cross Apply behaves like an inner join, Outer Apply like a left join
SQL Server CROSS APPLY and OUTER APPLY
Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.
SELECT DISTINCT P2.*
FROM (
SELECT
IDNo
, FirstName
, LastName
FROM people P
) P2
Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3
CREATE TABLE people
(
[entry] int
, [IDNo] varchar(3)
, [FirstName] varchar(5)
, [LastName] varchar(7)
);
INSERT INTO people
(entry,[IDNo], [FirstName], [LastName])
VALUES
(1,'uqx', 'bob', 'smith'),
(2,'abc', 'john', 'willis'),
(3,'ABC', 'john', 'willis'),
(4,'aBc', 'john', 'willis'),
(5,'WTF', 'jeff', 'bridges'),
(6,'Sss', 'bill', 'doe'),
(7,'sSs', 'bill', 'doe'),
(8,'ssS', 'bill', 'doe'),
(9,'ere', 'sally', 'abby'),
(10,'wtf', 'jeff', 'bridges')
;
Try this
SELECT *
FROM people P
where P.IDNo in (SELECT DISTINCT IDNo
FROM people)
Depending on the nature of the duplicate rows, it looks like all you want is to have case-sensitivity on those columns. Setting the collation on these columns should be what you're after:
SELECT DISTINCT p.IDNO COLLATE SQL_Latin1_General_CP1_CI_AS, p.FirstName COLLATE SQL_Latin1_General_CP1_CI_AS, p.LastName COLLATE SQL_Latin1_General_CP1_CI_AS
FROM people P
http://msdn.microsoft.com/en-us/library/ms184391.aspx
I have one table with gender as one of the columns.
In gender column only M or F are allowed.
Now i want to sort the table so that while displaying the table in gender field M and F will come alternetivly.
I have Tried....
I have tried to create one(new) table with the same structure as my existing table.
Now using high leval insert i want to insert M to odd rows and F to even rows.
After that i want to join those two statements using union operator.
I am able to insert to ( new ) the table only male or female but not to the even or odd rows...
Can any body help me regarding this....
Thanks in Advance....
Don't consider a table to be "sorted". The SQL server may return the rows in any order depending on execution plan, index, joins etc. If you want a strict order you need to have an ordered column, like an identity column. Usually it is better to apply the desired sorting when selecting data.
However the interleaving of M and F is a little bit tricky, you need to use the ROW_NUMBER function.
Valid SQL Server code:
CREATE TABLE #GenderTable(
[Name] [nchar](10) NOT NULL,
[Gender] [char](1) NOT NULL
)
-- Create sample data
insert into #GenderTable (Name, Gender) values
('Adam', 'M'),
('Ben', 'M'),
('Casesar', 'M'),
('Alice', 'F'),
('Beatrice', 'F'),
('Cecilia', 'F')
SELECT * FROM #GenderTable
SELECT * FROM #GenderTable
order by ROW_NUMBER() over (partition by gender order by name), Gender
DROP TABLE #GenderTable
This gives the output
Name Gender
Adam M
Ben M
Casesar M
Alice F
Beatrice F
Cecilia F
and
Name Gender
Alice F
Adam M
Beatrice F
Ben M
Cecilia F
Casesar M
If you use another DBMS the syntax may differ.
I think the best way to do it would be to have two queries (one for M, one for F) and then join them together. The catch would be you would have to calculate the "rank" of each query and then sort accordingly.
Something like the following should do what you need:
select * from
(select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'M'
union
select
#rownum:=#rownum+1 rank,
t.*
from people_table t,
(SELECT #rownum:=0) r
where t.gender = 'F') joined
order by joined.rank, joined.gender;
If you are using SQL Server, you can seed your two tables with an IDENTITY column as follows. Make one odd and one even and then union and sort by this column.
Note that you can only truly alternate if there are the same number of male and female records. If there are more of one than the other, you will end up with non-alternating rows at the end.
CREATE TABLE MaleTable(Id INT IDENTITY(1,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
INSERT INTO MaleTable(Gender) SELECT 'M'
CREATE TABLE FemaleTable(Id INT IDENTITY(2,2) NOT NULL, Gender CHAR(1) NOT NULL)
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
INSERT INTO FemaleTable(Gender) SELECT 'F'
SELECT u.Id
,u.Gender
FROM (
SELECT Id, Gender
FROM FemaleTable
UNION
SELECT Id, Gender
FROM MaleTable
) u
ORDER BY u.Id ASC
See here for a working example