How to remove line duplicates SQL via compare two same table - sql

How to remove duplicate values a = b and b = a?
with a as(select w.id , w.doc, w.org
, d.name_s, d.name_f, d.name_p, d.spec
, o.name, o.extid
from crm_s_workplaces w
join crm_s_docs d on d.id=w.doc
join crm_s_orgs o on o.id=w.org
where d.active=1 and d.cst='NY' and w.active=1 and w.cst='NY' and o.active=1
and
o.cst='NY')
select a1.doc, a2.doc,
a1.org,a1.name_s,a1.name_f,a1.name_p,a2.name_s,a2.name_f,a2.name_p from a a1
join a a2 on
a1.name_s=a2.name_s and
substr(a1.name_f,1,1)=substr(a2.name_f,1,1) and
substr(a1.name_p,1,1)=substr(a2.name_p,1,1) and
a1.org=a2.org and
a1.spec<>a2.spec
order by a1.name_s `enter code here`
ER model diagram:
Repeat example:
Sometimes comes across a1.spec > a2.spec:

What you are calling "duplicates" are actually not duplicates in your database.
You basically have multiple doc records for what could be the same person or not. See that even their names do not always match. For instace,
doc_id 1131385 has NAME_F = "Gabr" while
doc_id 1447530 has NAME_F = "Gabor"
In your database these are two different entities, and you cannot match them using primary key. You can try to join on the first, middle and last names, but as you can see in the above example with Gabor/Gabr, even that would not work.
Can you change the schema of the db? If so you need to separate the docs in one table - 1 record per doctor. And have the specialization in a separate table with the folloing columns:
spec_id (int, PK)
doc_id (foreign key to Doc table)
specialization
that way, if you have 1 doctor with 3 specs, he/she will show up only once in doc table, and multiple times in spec table.
I just notice something else. You have spec field in table workplaces. why? If you meant to say that Doc Gabor works as admin in hospital 1 but as a Therapist in hospital 2, you can do that. However, you have to remove the spec field from the doc table and only use the spec in workplaces table.

Related

SQL Query across two tables only show most recently updated result per tag address

I have two tables: violator_state and violator_tags
violator_state:
m_state_id
is_violating
m_translatedid
m_tag
m_violator_tag
This table holds the "tags" which has an unchanging row count of 10 in this case. The purpose is to list out each tag present, connect the full tag address (m_violator_tag) with its shorthand name (m_tag) and state whether it is in "violation". I need to use this table as reference because of the link between m_violator_tag and m_tag.
violator_tags
m_violator_id
m_eval_time_from
m_eval_time_to
m_tag
m_tag_peers
m_tag_position
This table is constantly having new rows added to it holding the information of what tags are in violation with a specific tag. So it would show T6 in violation with T1,T2,T9 ect.
I am looking to create a query which joins the two tables to show only the most recently updated (largest m_eval_time_from) for each tag.
I am using the following query to join the two tables but I expect m_translatedid and m_tag to match but they do not. Unsure why.
SELECT violator_state.m_violator_tag, violator_state.is_violating, violator_state.m_translatedid, violator_tags.m_tag, violator_tags.m_eval_time_to, violator_tags.m_tag_peers,
violator_tags.m_tag_position, violator_tags.m_eval_time_from
FROM violator_tags CROSS JOIN
violator_state
Violation_state table
violation_tags table
results of my (incorrect) query
Any suggestions on what I should try?
Your CROSS JOIN will give you a cartesian product where EVERY row in the first table is paired with ALL the rows in the second table e.g. if you have 10 rows in each, you will get 10 x 10 = 100 rows in the result! I believe you need to join the tables on the m_tag column and select the violator_tags row with the latest date. The query below should do this for you (though you haven't provided your question in a manner that makes it easy for me to double-check my code - see the link provided by a_horse_with_no_name for more on this or use a website like db-fiddle to set up your example).
SELECT vs.m_violator_tag,
vs.is_violating,
vs.m_translatedid,
vt.m_tag,
vt.m_eval_time_to,
vt.m_tag_peers,
vt.m_tag_position,
vt.m_eval_time_from
FROM violator_tags vt
JOIN violator_state vs
ON vt.m_tag = vs.m_tag
AND vt.m_eval_time_from = (SELECT MAX(vt.m_eval_time_from)
FROM violator_tags
WHERE m_tag = vt.m_tag)

Retrieve columns multiple times using SQL

In Access 2007 I have a table named Registars with a list of people and a table named Related.
Registars has a Primary key of Reg_ID and a field of Reg_Surname and a field of Reg_Forename.
Related table has a field of Reg_Person_ID and a field of Rel_Person_ID where both are primary keys (or combination key) a third field is relation_Type, i.e. cousin, sister etc.
What I am trying to write is a SQL script that will interrogate these two tables and using each record in the Related table output the ID of the first person and then their Forename then their surname then the second persons ID then thgeir forename then their surname. That is;
Reg_Person_ID Reg_Forename Reg_Surname Rel_Person_ID Reg_Forename Reg_Surname
So far what I have tried using SQL hasn't worked. Below is a screen dump of the two tables with data and the desired output.
SELECT
reg.Reg_Person
,reg.Reg_Forename
,reg.Reg_Surname
,rel.Rel_Person_ID
,rr.Reg_Forename AS Rel_Forename
,rr.Reg_Surname AS Rel_Surname
,rel.Relation_Type
FROM
Registrars reg
LEFT OUTER JOIN
Related rel
ON reg.Reg_ID = rel.Reg_Person_ID
LEFT OUTER JOIN
Registrars rr
ON rel.Rel_Person_ID = rr.Reg_ID

Inserting Into and Maintaining Many-to-Many Tables

SQLite3 user.
I have read thru numerous books on relational DBs and SQL and not one shows how to maintain the linking tables for many-to-many relationships. I just went through a book that went into the details of SELECT and JOINS with examples, but then glosses over the same when many-to-many relationships are covered. The author just showed some pseudo code for a table, no data, and then a pseudo code query--WTF? I am probably missing something, but it has become quite maddening.
Anyways, say I have a table like [People] with 3 columns: pID (primary), name and age. A table [Groups] with 3 columns: gID (primary), groupname and years. Since people can belong to multiple groups and groups can have multiple people, I set up a linking table called [peoplegroups] with two columns: pID, and gID both of which come from their respective tables.
So ,how do I efficiently get data into the linking table when INSERTING on the others and how do I get data out using the linking table?
Example: I want to INSERT "Jane" into [people] and make her a member of group gID 2, "bowlers" and update the linking table {peoplegroups] at the same time.
Later I want to go back and pull out a list of all of the bowlers or all the groups a person is part of.
If you already don't use primary and foreign keys (which you should!) I think you may need to consider using triggers in your design as well? So if you have a specific set of rules (e.g. if you want to create Jane with id = 1 and choose existing group 2, then after insert jane into people automatically create an entry pair personid=1,groupid=2 in the table peoplegroups. You can also create views with specific selects to see the data you want, for example if you want a query where you only show the peoples names and groups names you could create a view 'PeopleView':
SELECT P.PersonName, G.GroupName
FROM People P
INNER JOIN PeopleGroup PG ON P.PersonID = PG.PersonID
INNER JOIN Group G ON G.GroupId = PG.GroupID
then you can query 'PeopleView' saying
SELECT * FROM PeopleView WHERE GroupName = 'bowlers'
When inserting new data into the tables mentioned, the "linking" table that you are referring to needs to contain both primary keys from the other tables as foreign keys. So basically The [People] tables (pID) and the [Groups] table (gID) should both be foreign keys in the [PeopleGroups] table. In order to create a new "link" in [PeopleGroups] the record has to already exist in the [People] table as well as the [Groups] table BEFORE you try and create the link in the [PeopleGroups] table. I hope this helps

SQL Cross-Table Referencing

Okay, so I've got two tables. One table (table 1) contains a column Books_Owned_ID which stores a series of numbers in the form of 1,3,7. I have another table (table 2) which stores the Book names in one column and the book ID in another column.
What I want to do is create an SQL code which will take the numbers from Books_Owned_IDand display the names of those books in a new column. Like so:
|New Column |
Book 1 Name
Book 2 Name
Book 3 Name
I can't wrap my head around this, it's simple enough but all the threads I look on get really confusing.
Table1 contains the following columns:
|First_Name| Last_Name| Books_Owned_ID |
Table2 contains the following columns:
|Book_Name|Book_ID|
You need to do an inner join. This is a great example/reference for these
SELECT Book_Name FROM Table2
INNER JOIN Table1
ON Table1.Books_Owned_ID = Table2.Book_ID
EDIT SQL Fiddle
I will work on getting the column comma split working. It wont be a lot extra for this.
EDIT 2 See this answer to build a function to split your string. Then you can do this:
SELECT Book_Name FROM Table2
WHERE Book_ID IN(SELECT FN_ListToTable(',',Table1.Books_Owned_ID) FROM Table1 s)
The core of this centers around data normalisation... Each fact is stored only once (and so is "authoritative"). You should also get into the habit of only storing a single fact in any field.
So, imagine the following table layouts...
Books
Id, Name, Description
Users
Id, Username, EmailAddress, PasswordHash, etc....
BooksOwned
UserId, BookId
So if a single user owns multiple books, there will be multiple entries in the BooksOwned table...
UserId, BookID
1, 1
1, 2
1, 3
Indicates that User 1 owns books 1 through 3.
The reason to do it this way is that it makes it much easier to query in future. You also treat BookId as an Integer instead of a string containing a list - so you don't need to worry about string manipulation to do your query.
The following would return the name of all books owned by the user with Id = 1
SELECT Books.Name
FROM BooksOwned
INNER JOIN Books
ON BooksOwned.BookId = Books.Id
WHERE BooksOwned.UserId = 1
You need a function which takes a comma separated list and returns a table. This is slow and fundamentally a bad idea. Really all this does is convert this way of doing it to be like the data model I describe below. (see ProfessionalAmateur's answer for an example of this).
If you are just starting change your data model. Make a linking table. Like this:
Okay, so I've got two tables. One table (table 1) contains a column Books_Owned_ID which stores a series of numbers in the form of 1,3,7. I have another table (table 2) which stores the Book names in one column and the book ID in another column.
What I want to do is create an SQL code which will take the numbers from Books_Owned_IDand display the names of those books in a new column. Like so:
Person Table
|First_Name| Last_Name| Person_ID |
Book Table
|Book_Name|Book_ID|
PersonBook Table
|PersonID|BookID|
This table can have more than one row for each person.

SQL Query problem

Consider two tables. Technician table has fields like T_ID,T_Name. Project table has fields like P_ID,P_Name, P_Date.
Now, since a Technician can work on many projects and a project can be done by many Technicians. Therefore, as evident, there is a many to many relationship between the two tables. Break the many to many, and create a new table called Assignment which consists of the foreign keys T_ID and P_ID.
Here is the question: I want to find out a list of who a particular technician (Technician with T_ID = 1 worked with over the last month (April/2011). For example if Technician 1 worked with Tech 2 and 3 then they qualify for the above query result and i would like tech 2 and Tech 3 T_ID T_name.
The answer can be also based on two queries linked.
Kindly let me know what will be the query for the mentioned problem.
This would find all the people who worked on assignments with the technician who has ID = 10. Maybe:
SELECT t.T_ID, t.T_Name
FROM Technician t, Project p, Assignments a
WHERE t.T_ID = a.T_ID and p.P_ID = a.P_ID
and a.P_ID IN (SELECT assign.P_ID FROM Assignments assign, Projects proj WHERE assign.T_ID = 10 and assign.P_ID = proj.P_ID and (proj.P_Date - getdate() <= 30))
The date is a little bit of a guess as I'm not sure on the syntax, however the rest should get the information you want.
Give Assignment table a primary key of its own (because composite keys suck).
The two ? characters represent the ID of the Technician that you're asking
who worked with. For example: "Who worked with Technician 5", the ? would be
5's.
SELECT
a1.T_ID
FROM
Assignment a1
WHERE
a1.A_ID IN (
SELECT assignment.A_ID
FROM Assignment a2
WHERE a2.T_ID = ?
)
AND a1.T_ID != ?
;