Select from multiple tables, remove duplicates - sql

I have two tables in a SQLite DB, and both have the following fields:
idnumber, firstname, middlename, lastname, email, login
One table has all of these populated, the other doesn't have the idnumber, or middle name populated.
I'd LIKE to be able to do something like:
select idnumber, firstname, middlename, lastname, email, login
from users1,users2 group by login;
But I get an "ambiguous" error. Doing something like:
select idnumber, firstname, middlename, lastname, email, login from users1
union
select idnumber, firstname, middlename, lastname, email, login from users2;
LOOKS like it works, but I see duplicates. my understanding is that union shouldn't allow duplicates, but maybe they're not real duplicates since the second user table doesn't have all the fields populated (e.g. "20, bob, alan, smith, bob#bob.com, bob" is not the same as "NULL, bob, NULL, smith, bob#bob.com, bob").
Any ideas? What am I missing? All I want to do is dedupe based on "login".
Thanks!

As you say union will remove duplicate records (note that union all won't!). Two records are considered duplicates when all their column values match. In the example you considered in your question it is clear that NULL is not equal to 20 or 'alan' so those records won't be considered duplicates.
Edit:
[...] the only way I can think would be creating a new table [...]
That is not necessary. I think you can do the following:
select login, max(idnumber), max(firstname), max(middlename), max(lastname),
max(email) from (
select idnumber, firstname, middlename, lastname, email, login from users1
union
select idnumber, firstname, middlename, lastname, email, login from users2
) final
group by login
However, if you're sure that you only have different values on idnumber and middlename you can max only those fields and group by all the rest.

You could left join the incomplete table to the complete one via the login. Then programmatically manipulate the resulting set.

Related

SQL ignore one field with DISTINCT but ORDER BY it

I have two SQL SELECT with UNION DISTINCT, one with data from a new Database and one from an old Database so in each SELECT I have a field that describes from which Database the data came.
A simplified Example code:
SELECT username, name, lastname, 1 AS DB
From new_DB.users
UNION DISTINCT
SELECT username, name, lastname, 2 AS DB
From old_DB.users
ORDER BY db, lastname, name ASC
Data output looks like this:
username
name
lastname
DB
Fmuster
Fiona
Muster
1
kroos
Kim
Roos
1
Mmuster
Max
Muster
1
kroos
Kim
Roos
2
Ysoroli
Yelda
Soroli
2
My problem is:
That there is duplicated data in the output.
The data that shouldn't be in the output is the second kroos but I can't just remove the field DB because I have to show all results from new_DB(DB 1) at the top.
thx for your help
Kim
You can use NOT EXISTS to filter out duplicate rows coming from old_DB.users:
SELECT username, name, lastname, 1 AS DB
FROM new_DB.users
UNION
SELECT o.username, o.name, o.lastname, 2 AS DB
FROM old_DB.users o
WHERE NOT EXISTS (
SELECT 1
FROM new_DB.users n
WHERE (n.username, n.name, n.lastname) = (o.username, o.name, o.lastname)
)
ORDER BY db, lastname, name ASC;
Or, with aggregation:
SELECT username, name, lastname, MIN(DB) AS DB
FROM (
SELECT username, name, lastname, 1 AS DB
FROM new_DB.users
UNION
SELECT username, name, lastname, 2 AS DB
FROM old_DB.users
) t
GROUP BY username, name, lastname
ORDER BY db, lastname, name ASC;

VB.NET/Access - SELECT SUM SQL statement

I have a table with LastName, FirstName, Wins, Losses, CompFormat and Medals columns. In case the person who told me not to use pictures on my last question sees this I tried your suggestion and couldn't figure it out so I have to use pictures. So don't bite my head off this time. I successfully added and grouped Wins with CompFormat like this...
("SELECT SUM(Wins) AS Total, FirstName, LastName, CompFormat FROM CompetitionDate GROUP BY LastName, FirstName, CompFormat;")
Which correctly produced this in my datagridview...
Instead of what I did I want to add counting losses and group it to look like this
Here is my access table...
I think you just need to add the Losses column:
SELECT FirstName, LastName, CompFormat,
SUM(Wins) AS Wins, SUM(Losses) as Losses
FROM CompetitionDate
GROUP BY LastName, FirstName, CompFormat

Inserting column data from another table together with more value

How to insert few columns from TableA to tableB together with some additional values.
Following is one way I tried and failed, but it shows clearly what I want to achive:
Insert into
TableA (UserID, FirstName, Lastname,EmailAddress,IsActive,IsOnline,IsLockedOut,Comment)
values
(Select distinct UserID, FirstName, LastName, EmailAddress from TableB,0,0,0,'Imported')
You cannot use values when you use select keyword, also you should include constant/static values in your select statement itself, try this
Insert into
TableA
(
UserID, FirstName, Lastname,
EmailAddress,IsActive,IsOnline,
IsLockedOut,Comment
)
Select distinct UserID, FirstName, LastName,
EmailAddress ,
0,0,0,'Imported'
FROM TableB
You need to include the hard code values along with the columns before the FROM part of your query. SO, change your query to this:
Select distinct UserID, FirstName, LastName, EmailAddress, 0, 0, 0, 'Imported'
from TableB

group by while selecting many more columns

I have this query :
select first_name, last_name, MAX(date)
from person p inner join
address a on
a.person_id = p.id
group by first_name, last_name
with person(sid, last_name, first_name), address(data, cp, sid, city)
My question is how I can have a query that select first_name, last_name, MAX(date), city, cp
without adding city and cp to the group
I mean I want to have all 5 columns but only for the datas grouped by first_name, last_name and date
Many Thanks
This is not possible. Say you have three John Smith in your database, each of them having one or two addresses. When you group by name now, then what city do you want to get? The city of which John Smith and of which of his addresses? As there is no implicit answer to this question, there is no way to write a select statement without explicitly saying which city is to be selected.

Inserting multiple rows using SQL - issue with manually incrementing numbers

Someone else designed this table and I am not allowed to modify it so bear with me.
I am trying to insert multiple rows from one table into another. The table where I am inserting the rows has an ID but it does not auto-increment. I cannot figure out how to manually increment the id as I insert rows. The current code throws an error:
Error running query. Page15.CaseSerial is invalid in the select list
becauseit is not contained in either an aggregate function or the
GROUP BY clause.
I've tried adding a GROUP BY clause with no success.
Here's the code:
insert into page4 (serial, caseserial, linkserial, type, add1, add2, city, state, orgname, prefername, email, firstname, lastname, salutation, contactstatus, workphone, notes, cellphone, nametype, homephone, fax, zip, payments)
select id = max(serial), caseserial, linkserial, type, add1, add2, city, state,
orgname, prefername, email, firstname, lastname, salutation, contactstatus,
workphone, notes, cellphone, nametype, homephone, fax, zip, payments
from page16
It would be nice if I could write something to get the highest id from page4 and insert the next highest.
Thanks!
declare #maxId int
select #maxId = max(yourIdColumn)
from YourTable
set #maxId = #maxId + 1
insert into YourTable (yourIdColumn, ....)
values (#maxId, ....)
Disclaimer: not sure how this would transpose over to other RDBMS's, but this is with SQL Server in mind. Also, this handles inserting only one value. If you need to insert a set of values, then please let me know.