I'm pretty new to SQL Server (using ssms). I need some help to insert and organize data from one table into multiple tables (which are connected to each other by PK/FK).
The source table has the following columns:
Email, UserName, Phone
It's a messy table with lots of duplicates: same email with different username and so on..
My data tables are:
Person - PersonID(PK, int, not null)
Email - Email (nvarchar, null) , PersonID (FK, int, not null)
Phone - PhoneNumber (int, null) , PersonID (FK, int, not null)
UserName - UserName (nvarchar, null) , PersonID (FK, int, not null)
For each row in the source table, I need to check if the person already exists (by Email); if it does exist, I need to add the new data (if any), else I need to create a new person and add the data.
I searched here for some solutions and find recommendations of using CURSOR.
I tried it, but it takes a really long time to execute (hours.. and still going)
Thanks for any help!
example:
from>
EMAIL | USERNAME | PHONE
------------------------
a#a.a | john | 956165
b#b.b | smith | 123456
c#c.c | bob | 654321
d#d.d | mike | 986514
a#a.a | dan | 658732
e#e.e | dave | 147258
f#f.f | harry | 951962
b#b.b | emmy | 456789
g#g.g | kelly | 789466
h#h.h | kelly | 258369
a#a.a | ana | 852369
to>
EMAIL | PERSONID
----------------
a#a.a | 1
b#b.b | 2
c#c.c | 3
d#d.d | 4
e#e.e | 5
f#f.f | 6
g#g.g | 7
h#h.h | 8
USERNAME | PERSONID
-------------------
john | 1
smith | 2
bob | 3
mike | 4
dan | 1
dave | 5
harry | 6
emmy | 2
kelly | 7
kelly | 8
ana | 1
PHONE | PERSONID
----------------
956165 | 1
123456 | 2
654321 | 3
986514 | 4
658732 | 1
147258 | 5
951962 | 6
456789 | 2
789466 | 7
258369 | 8
852369 | 1
Cursors will generally be slower as they operate on a row-by-row basis. Using set based operations, such as a join, will yield better performance. It's somewhat older, but this article further details the implications of cursors as opposed to set operations. I wasn't entirely sure what columns you want to use to verify matches, as well as what data to add, but a basic example is below and you can fill in the columns as necessary. The Email table was used in the example. For the UPDATE, this will update existing rows based off corresponding rows in the source table. Being an INNER JOIN, only rows with matches on both sides will be impacted. In the second statement, this is an INSERT using only rows from the source table that don't exist in the Email table. This same functionality could also be accomplished using the MERGE statement, however there are a number of issues with this, including problems with deadlocks and key violations.
Update Existing Rows:
UPDATE E
SET E.ColumnA = SRC.ColumnA,
E.ColumnB = SRC.ColumnB
FROM YourDatabase.YourSchema.Email E
INNER JOIN YourDatabase.YourSchema.SourceTable SRC
ON E.Email = SRC.Email
Add New Rows:
INSERT INTO YourDatabase.YourSchema.Email (ColumnA, ColumnB)
SELECT
ColumnA,
ColumnB
FROM YourDatabase.YourSchema.SourceTable
WHERE EMAIL NOT IN ((SELECT EMAIL FROM YourDatabase.YourSchema.Email))
Related
EDIT
I've edited this question to make it a little more concise, if you see my edit history you will see my effort and 'what I've tried' but it was adding a lot of unnecessary noise and causing confusion so here is a summary of input and output:
People:
ID | FullName
--------------------
1 | Jimmy
2 | John
3 | Becky
PeopleJobRequirements:
ID | PersonId | Title
--------------------
1 | 1 | Some Requirement
2 | 1 | Another Requirement
3 | 2 | Some Requirement
4 | 3 | Another Requirement
Output:
FullName | RequirementTitle
---------------------------
Jimmy | Some Requirement
Jimmy | Another Requirement
John | Some Requirement
John | null
Becky | null
Becky | Another Requirement
Each person has 2 records, because that's how many distinct requirements there are in the table (distinct based on 'Title').
Assume there is no third table - the 'PeopleJobRequirements' is unique to each person (one person to many requirements), but there will be duplicate Titles in there (some people have the same job requirements).
Sincere apologies for any confusion caused by the recent updates.
CROSS JOIN to get equal record for each person and LEFT JOIN for matching records.
Following query should work in your scenario
select p.Id, p.FullName,r.Title
FROM People p
cross join (select distinct title from PeopleJobRequirements ) pj
left join PeopleJobRequirements r on p.id=r.personid and pj.Title=r.Title
order by fullname
Online Demo
Output
+----+----------+---------------------+
| Id | FullName | Title |
+----+----------+---------------------+
| 3 | Becky | Another Requirement |
+----+----------+---------------------+
| 3 | Becky | NULL |
+----+----------+---------------------+
| 1 | Jimmy | Some Requirement |
+----+----------+---------------------+
| 1 | Jimmy | Another Requirement |
+----+----------+---------------------+
| 2 | John | NULL |
+----+----------+---------------------+
| 2 | John | Some Requirement |
+----+----------+---------------------+
use left join, no need any subquery
select p.*,jr.*,jrr.*
from People p left join
PeopleJobRequirements jr on p.Id=jrPersonId
left join JobRoleRequirements jrr p.id=jrr.PersonId
according the explanation, People and PeopleJobRequirements tables have many to many relationship (n to n).
so first of all you'll need another table to relate these to table.
first do this and then a full join will make it right.
I want to create a new table using the results from some queries. I might be looking at this the wrong way so please feel free to let me know. Because of this I will try to make this question simple without putting my code to match each employee number with each manager level column from table2
I have two tables, one has employee names and employee numbers example
table 1
+-------------+-----------+-------------+-------------+
| emplpyeenum | firstname | last name | location |
+-------------+-----------+-------------+-------------+
| 11 | joe | free | JE |
| 22 | jill | yoyo | XX |
| 33 | yoda | null | 9U |
+-------------+-----------+-------------+-------------+
and another table with employee numbers under each manager level so basically a hierarchy example
Table 2
+---------+----------+----------+
| manager | manager2 | manager3 |
+---------+----------+----------+
| 11 | 22 | 33 |
+---------+----------+----------+
I want to make a new table that will have the names besides the numbers, so for example but with employee number beside the names
+---------+--------+----------+
| level 1 | level2 | level3 |
+---------+--------+----------+
| jill | joe | yoda |
+---------+--------+----------+
How can I do this?
edit sorry guys I don't have permission to create a new table or view
Why not change your table2 to this?
+------------+----------+
| EmployeeId | ManagerId|
+------------+----------+
| 11 | NULL |
+------------+----------+
| 22 | 11 |
+------------+----------+
| 33 | 22 |
+------------+----------+
Then you can do what you want with the data. At least your data will be properly normalized. In your table2. What happen if employee 33 hire another employee below him? You will add another column?
Based on your available table, this should give you the result you want.
SELECT m1.firstname, m2.firstname, m3.firstname
FROM table2 t
LEFT JOIN table1 m1 ON m1.employeenum = t.manager
LEFT JOIN table1 m2 ON m2.employeenum = t.manager2
LEFT JOIN table1 m3 ON m3.employeenum = t.manager3
You can just do a basic create table, then do a insert select to that will fill the table the way you need it. All you have to do is replace the select statement that I provided with the one you used to create the levels table output.
create table Levels
(
level1 varchar(25),
level2 varchar(25),
level3 varchar(25)
)
insert into Levels(level1, level2, level3)
select * from tables --here you would put the select statement that you used to create the information. If you dont have this script then let me know
I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!
Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA
Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.
I am working on a project and for my login credentials checking process I am trying to create a view in which the name,surname,username and password of customers,workers and admins are stored so that I can search faster and I have two questions.
Do you think it is a good idea to do that ?
If yes, can you help me how to do that?
Thank you in advance.
1) yes, but for simplicity rather than performance (and a few other reasons)
2) CREATE OR REPLACE VIEW viewname AS your_select_statement;
If the front-end is a single interface for both customers and employees then the tables should not be separated in the first place. If you have a person who is both a customer and a worker then they would appear on two tables and it is possible the data would not be synchronized between the two and if you create a view then they would appear twice. Instead create a single table for all people and have separate tables for data specific to customers, workers and admins.
Something like:
People
id | firstname | surname | username | password_hash | password_salt
--------------------------------------------------------------------
1 | alice | abbot | aa | abc | 123
2 | bob | barnes | bb | def | 456
3 | charlotte | carol | cc | ghi | 789
4 | daniel | david | dd | jkl | 036
Customers
id | Credit_Limit | has_Trade_Account
-------------------------------------
2 | 0 | 0
3 | 2000 | 1
Workers
id | Joining_Date | Grade
--------------------------
1 | 2015-01-01 | 5
3 | 2000-12-25 | 3
Admins
id | Edit_Permissions
----------------------
3 | Orders
3 | Stock
I'm generating test data for a new database, and I'm having trouble populating one of the foreign key fields. I need to create a relatively large number (1000) of entries in a table (SurveyResponses) that has a foreign key to a table with only 6 entries (Surveys)
The database already has a Schools table that has a few thousand records. For arguments sake lets say it looks like this
Schools
+----+-------------+
| Id | School Name |
+----+-------------+
| 1 | PS 1 |
| 2 | PS 2 |
| 3 | PS 3 |
| 4 | PS 4 |
| 5 | PS 5 |
+----+-------------+
I'm creating a new Survey table. It will only have about 3 rows.
Survey
+----+-------------+
| Id | Col2 |
+----+-------------+
| 1 | 2014 Survey |
| 2 | 2015 Survey |
| 3 | 2016 Survey |
+----+-------------+
SurveyResponses simply ties a school to a survey.
Survey Responses
+----+----------+----------+
| Id | SchoolId | SurveyId |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 1 |
| 4 | 4 | 3 |
| 5 | 5 | 2 |
+----+----------+----------+
Populating the SurveyId field is what's giving me the most trouble. I can randomly select 1000 Schools, but I haven't figured out a way to generate 1000 random SurveyIds. I've been trying to avoid a while loop, but maybe that's the only option?
I've been using Red Gate SQL Data Generator to generate some of my test data, but in this case I'd really like to understand how this can be done with raw SQL.
Here is one way, using a correlated subquery to get a random survey associated with each school:
select s.schoolid,
(select top 1 surveyid
from surveys
order by newid()
) as surveyid
from schools s;
Note: This doesn't seem to work. Here is a SQL Fiddle showing the non-workingness. I am quite surprised it doesn't work, because newid() should be a
EDIT:
If you know the survey ids have no gaps and start with 1, you can do:
select 1 + abs(checksum(newid()) % 3) as surveyid
I did check that this does work.
EDIT II:
This appears to be overly aggressive optimization (in my opinion). Correlating the query appears to fix the problem. So, something like this should work:
select s.schoolid,
(select top 1 surveyid
from surveys s2
where s2.surveyid = s.schoolid or s2.surveyid <> s.schoolid -- nonsensical condition to prevent over optimization
order by newid()
) as surveyid
from schools s;
Here is a SQL Fiddle demonstrating this.