How do I generate a crosswalk ID between two SQL tables - sql

I have a SQL table consisting of names, addresses and some associated numerical data paired with a code. The table is structured such that each number-code pair has its own row with redundant address info. abbreviated version below, let's call it tblPeopleData
Name Address ArbitraryCode ArbitraryData
----------------------------------------------------------------------------
John Adams 45 Main St, Rochester NY a 111
John Adams 45 Main St, Rochester NY a 231
John Adams 45 Main St, Rochester NY a 123
John Adams 45 Main St, Rochester NY b 111
John Adams 45 Main St, Rochester NY c 111
John Adams 45 Main St, Rochester NY d 123
John Adams 45 Main St, Rochester NY d 124
Jane McArthur 12 1st Ave, Chicago IL a 111
Jane McArthur 12 1st Ave, Chicago IL a 231
Jane McArthur 12 1st Ave, Chicago IL a 123
Jane McArthur 12 1st Ave, Chicago IL b 111
Jane McArthur 12 1st Ave, Chicago IL c 111
Jane McArthur 12 1st Ave, Chicago IL e 123
Jane McArthur 12 1st Ave, Chicago IL e 124
My problem is that this table is absolutely massive (~10 million rows) and I'm trying to split it up to make traversal less staggeringly sluggish.
What I've done so far is to make a table of just addresses, using something like:
SELECT DISTINCT Address FROM tblPeopleData (etc.)
Leaving me with:
Name Address
------------------------------------------
John Adams 45 Main St, Rochester NY
Jane McArthur 12 1st Ave, Chicago IL
...just a list of addresses. I want to be able to look up each address and see which names reside at that address, so I assigned each address a UniqueID, such that now I have (this table is around ~500,000 rows in my dataset):
Name Address AddressID
--------------------------------------------------------
John Adams 45 Main St, Rochester NY 000001
Jane McArthur 12 1st Ave, Chicago IL 000002
In order to be able to look up people by address though, I need this AddressID field added to tblPeopleData, such that each address in tblPeopleData is associated with its AddressID and this is added to every row, such that I would have:
Name Address ArbitraryCode ArbitraryData AddressID
----------------------------------------------------------------------------------------
John Adams 45 Main St, Rochester NY a 111 00001
John Adams 45 Main St, Rochester NY a 231 00001
John Adams 45 Main St, Rochester NY a 123 00001
John Adams 45 Main St, Rochester NY b 111 00001
John Adams 45 Main St, Rochester NY c 111 00001
John Adams 45 Main St, Rochester NY d 123 00001
John Adams 45 Main St, Rochester NY d 124 00001
Jane McArthur 12 1st Ave, Chicago IL a 111 00002
Jane McArthur 12 1st Ave, Chicago IL a 231 00002
Jane McArthur 12 1st Ave, Chicago IL a 123 00002
Jane McArthur 12 1st Ave, Chicago IL b 111 00002
Jane McArthur 12 1st Ave, Chicago IL c 111 00002
Jane McArthur 12 1st Ave, Chicago IL e 123 00002
Jane McArthur 12 1st Ave, Chicago IL e 124 00002
How do I make this jump from having UniqueIDs for AddressID in my unique addresses table, to adding these all to each row with a corresponding address back in my tbPeopleData?

Just backfill the calculated AddressID back to tblPeopleData - you can combine an UPDATE with a FROM (like you would do in a select)
UPDATE tblPeopleData
SET AddressID = a.AddressID
FROM tblPeopleData pd
INNER JOIN tblAddressData a
ON pd.Address = a.Address

You would alter the table to have the address id:
alter table tblPeopleData add AddressId int references Address(AddressId);
Then you can update the value using a JOIN:
update tblPeopleData pd JOIN
Address a
ON pd.Address = a.Address
pd.AddressId = a.AddressId;
You will definitely want an index on Address(Address) for this.
Then, you can drop the old column:
alter table drop column Address;
Note:
It might be faster to save the results in a temporary table, because the update is going to generate lots and lots of log records. For this, truncate the original table, and re-load the data:
SELECT . . . , a.AddressId
INTO tmp_tblPeopleData
FROM tblPeopleData pd JOIN
Address a
ON pd.Address = a.Address;
TRUNCATE TABLE tblPeopleData;
INSERT INTO tblPeopleData( . . .)
SELECT . . .
FROM tmp_tblPeopleData;

Related

Presenting Data uniformly between two different table presentations with SQL

Hello Everyone I have a problem…
Table 1 (sorted) is laid out like this:
User ID Producer ID Company Number
JWROSE 23401 234
KXPEAR 23903 239
LMWEEM 27902 279
KJMORS 18301 183
Table 2 (unsorted) looks like this:
Client Name City Company Number
Rajat Smith London JWROSE
Robert Singh Cleveland KXPEAR
Alberto Johnson New York City LMWEEM
Betty Lee Dallas KJMORS
Chase Galvez Houston 23401
Hassan Jackson Seattle 23903
Tooti Fruity Boise 27902
Joe Trump Tokyo 18301
Donald Biden Cairo 234
Mike Harris Rome 239
Kamala Pence Moscow 279
Adolf Washington Bangkok 183
Now… Table 1 has all of the User IDs and Producer IDs properly rowed with the Company Number.
I want to pull all the data and correctly sorted….
Client Name City User ID Producer ID Company Number
Rajat Smith London JWROSE 23401 234
Robert Singh Cleveland KXPEAR 23903 239
Alberto Johnson New York City LMWEEM 27902 279
Betty Lee Dallas KJMORS 18301 183
Chase Galvez Houston JWROSE 23401 234
Hassan Jackson Seattle KXPEAR 23903 239
Tooti Fruity Boise LMWEEM 27902 279
Joe Trump Tokyo KJMORS 18301 183
Donald Biden Cairo JWROSE 23401 234
Mike Harris Rome KXPEAR 23903 239
Kamala Pence Moscow LMWEEM 27902 279
Adolf Washington Bangkok KJMORS 18301 183
Query:
Select
b.client_name,
b.city.,
a.user_id,
a.producer_id,
a.company_number
From Table 1 A
Left Join Table 2 B On a.company….
And this is where I don’t know what do to….because both tables have all the same variables, but Company Number in Table 2 is mixed with User IDs and Producer IDs... however we know what company Number those ID's are associated to.
As I mention in the comments, and others do, the real problem is your design. "The fact that UserID is clearly a varchar, while the other 2 columns are an int really does not make this any better", and makes this not simple (and certainly not SARGable).
To get the data in the correct order, as well, you need a column to order it on which the data lacks. I have therefore added a pseudo column, MissingIDColumn, to represent this missing column you need to add to your data; which you can do when you fix the design:
SELECT T2.ClientName,
T2.City,
T1.UserID,
T1.ProducerID,
T1.CompanyNumber
FROM (VALUES('JWROSE',23401,234),
('KXPEAR',23903,239),
('LMWEEM',27902,279),
('KJMORS',18301,183))T1(UserID,ProducerID,CompanyNumber)
JOIN (VALUES(1,'Rajat Smith ','London ','JWROSE'),
(2,'Robert Singh ','Cleveland ','KXPEAR'),
(3,'Alberto Johnson ','New York City','LMWEEM'),
(4,'Betty Lee ','Dallas ','KJMORS'),
(5,'Chase Galvez ','Houston ','23401'),
(6,'Hassan Jackson ','Seattle ','23903'),
(7,'Tooti Fruity ','Boise ','27902'),
(8,'Joe Trump ','Tokyo ','18301'),
(9,'Donald Biden ','Cairo ','234'),
(10,'Mike Harris ','Rome ','239'),
(11,'Kamala Pence ','Moscow ','279'),
(12,'Adolf Washington','Bangkok ','183'))T2(MissingIDColumn,ClientName,City,CompanyNumber) ON T2.CompanyNumber IN (T1.UserID,CONVERT(varchar(6),T1.ProducerID),CONVERT(varchar(6),T1.CompanyNumber))
ORDER BY MissingIDColumn;

SQL logic for getting records in a single row for a unique id

![Cognost reports studio Query Explorer]
Below is the snapshot of a table.
**Acctno ClientNo ClientName PrimaryOffId SecondaryOffID**
101 11111 ABC corp 3 Not Defined
102 11116 XYZ Inc 5 Not Defined
103 11113 PQRS Corp 2 9
104 55555 Food LLC 4 11
105 99999 Kwlg Co 1 Not Defined
106 99999 Kwlg Co 1 Not Defined
107 11112 LMN Corp Not Defined 6
108 11112 LMN Corp Not Defined 6
109 11115 Sleep Co 4 10
110 44444 Cool Co Not Defined 8
111 11114 Sail LLC 3 Not Defined
112 66666 Fun Inc 1 Not Defined
113 88888 Job LLC 5 12
114 22222 Acc Co Not Defined Not Defined
115 77777 Good Corp 2 Not Defined
116 33333 City LLC Not Defined 7
117 33333 City LLC Not Defined 7
118 33333 City LLC Not Defined 7
119 11111 ABC corp 3 Not Defined
I want to replace PrimaryOffID and SecondaryOffID with their Names coming from this table
EmpID Names
1 Cathy
2 Chris
3 John
4 Kevin
5 Mark
6 Celine
7 Jane
8 Phil
9 Jess
10 Jose
11 Nick
12 Rosy
The Result should look like this: Notice that, If Cathy is the PrimaryOfficer, she can't be the Secondary Officer and vice versa. This logic is applicable for all the Names
Acctno ClientNo Client Name PrimOffName SecondaryOffName
101 11111 ABC corp John Not Defined
102 11116 XYZ Inc Mark Not Defined
103 11113 PQRS Corp Chris Jess
104 55555 Food LLC Kevin Nick
105 99999 Kwlg Co Cathy Not Defined
106 99999 Kwlg Co Cathy Not Defined
107 11112 LMN Corp Not Defined Celine
108 11112 LMN Corp Not Defined Celine
109 11115 Sleep Co Kevin Jose
110 44444 Cool Co Not Defined Phil
111 11114 Sail LLC John Not Defined
112 66666 Fun Inc Cathy Not Defined
113 88888 Job LLC Mark Rosy
114 22222 Acc Co Not Defined Not Defined
115 77777 Good Corp Chris Not Defined
116 33333 City LLC Not Defined Jane
117 33333 City LLC Not Defined Jane
118 33333 City LLC Not Defined Jane
119 11111 ABC corp John Not Defined
But Instead it looks like this:
Acctno ClientNo ClientName PrimOffName SecondaryOffName
101 11111 ABC corp John Not Defined
102 11116 XYZ Inc Mark Not Defined
103 11113 PQRS Corp Chris Not Defined
103 11113 PQRS Corp Not Defined Jess
104 55555 Food LLC Kevin Not Defined
104 55555 Food LLC Not Defined Nick
105 99999 Kwlg Co Cathy Not Defined
106 99999 Kwlg Co Cathy Not Defined
107 11112 LMN Corp Not Defined Celine
108 11112 LMN Corp Not Defined Celine
109 11115 Sleep Co Kevin Not Defined
109 11115 Sleep Co Not Defined Jose
110 44444 Cool Co Not Defined Phil
111 11114 Sail LLC John Not Defined
112 66666 Fun Inc Cathy Not Defined
113 88888 Job LLC Mark Not Defined
113 88888 Job LLC Not Defined Rosy
114 22222 Acc Co Not Defined Not Defined
115 77777 Good Corp Chris Not Defined
116 33333 City LLC Not Defined jane
117 33333 City LLC Not Defined jane
118 33333 City LLC Not Defined jane
119 11111 ABC corp John Not Defined
Notice that, now the Acctno is no more unique, Where ever the Names should have been in both the fields together, it separates and gives the output in the next row creating multiple records. i tried various options but it didn't work. Please be aware, that I am creating this report in Cognos Studio. Please suggest the possible query to get the desired result. Thanks in Advance. Appreciate your help.
You don't state which version of Cognos you're using. "Cognos Studio" is ambiguous. I'm most familiar with 8.4.1, but even then you don't say if you're trying to define this in the Cognos model, Query Studio, Event Studio or Report Studio.
Second, you should always show what you've got so far when asking questions on StackOverflow. People want to see what you have done to show you want to fix, not repeat the lion's share of the work. That's why you got downvotes.
As far as plain SQL, you'll want to do this:
SELECT a.Acctno, a.ClientNo, a.ClientName, coalesce(e1.Names,'Not Defined') "PrimaryOffName", coalesce(e2.Names,'Not Defined') "SecondaryOffName"
FROM Account a
LEFT OUTER JOIN Emp e1
ON t.PrimaryOffID = e1.EmpID
LEFT OUTER JOIN Emp e2
ON t.PrimaryOffID = e2.EmpID
I made up table names. You can do this in Report Studio by creating two queries for Emp and outer joining them in succession to the Account query.
If you're able to, you'll want to move the OffID fields to a separate juntion table and remove them from the Account table. You can then create a Status field or flag in that junction table that identifies primary and secondary.

Copy rows of data in SQL Server

Please help me come up with a solution for the situation being explained below:
ID name address age hobby GPA
---------------------------------------------------------
101 James 100 Garfield St 21 reading 3.13
101 James 100 Garfield St 21 writing 2.63
101 James 100 Garfield St 21 running 3.81
109 Tom 19 Lily Ave 19 dating 3.54
109 Tom 20 Lily Ave 19 climbing 2.76
109 Tom 21 Lily Ave 19 watching 3.91
I want to copy the set of rows with the same ID (eg. 101) and assign each set a State abbreviation(s) by running a single sql query. For instance: adding states CA, NJ, and DE to rows with an ID of 101, the result set is expected to look like this:
ID name address age hobby GPA state
-----------------------------------------------------------------------
101 James 100 Garfield St 21 reading 3.13 CA
101 James 100 Garfield St 21 writing 2.63 CA
101 James 100 Garfield St 21 running 3.81 CA
101 James 100 Garfield St 21 reading 3.13 NJ
101 James 100 Garfield St 21 writing 2.63 NJ
101 James 100 Garfield St 21 running 3.81 NJ
101 James 100 Garfield St 21 reading 3.13 DE
101 James 100 Garfield St 21 writing 2.63 DE
101 James 100 Garfield St 21 running 3.81 DE
Please keep in mind that everything else remains the same way as they were before the addition of the state abbreviations. Also assume I have more than three states to add and integrate to the query, say, I have all 50 states. Thank you for your time and effort in advance!
This should produce that result set:
select x.*, y.st
from tbl x
join
(select 'CA' as st union all
select 'NJ' union all
select 'DE') y
where x.id = 101
Create a new table with IDs and States
ID ST
101 CA
101 NJ
101 DE
109 ..
then join that on your table
SELECT t.*, s.st
FROM tbl t
JOIN states s ON t.id = s.id

Insert a new row for every change in value in a column

I have a sql table, in which for every change in a value in a certain column say Column C, I want to insert a new row under it to create a new transaction. I am not sure how to find that value change and insert a new row. I have been doing this through VB code on the csv file I Import into the table but unable to write it in SQL.
Sub InsertRows()
Dim r As Long, mcol As String, i As Long
' find last used cell in Column A
r = Cells(Rows.Count, "A").End(xlUp).Row
' get value of last used cell in column A
mcol = Cells(r, 1).Value
' insert rows by looping from bottom
For i = r To 2 Step -1
If Cells(i, 1).Value <> mcol Then
mcol = Cells(i, 1).Value
Rows(i + 1).Insert
End If
Next i
End Sub
Here's the sample data
ID JOB FNAME LNAME ADDRESS1 ADDRESS2 DATE Concatenated
1234 A John Smith 4378 Anna St Seattle-WA-98040 12/24/2013 1234-A-41632
1234 A John Doe 3564 Lucie Ave Mercer Island-WA-98040 12/24/2013 1234-A-41632
1235 A Alex Smith 4554 Devon Ave Chicago-IL-60563 12/24/2013 1235-A-41632
1235 A Eli Manning 5555 Stranz Lane Dallas-TX-75213 12/24/2013 1235-A-41632
1233 B John Smith 4378 Anna St Seattle-WA-98040 12/24/2013 1233-B-41632
1233 C John Doe 3564 Lucie Ave Mercer Island-WA-98040 12/24/2013 1233-C-41632
1236 D Alex Smith 4554 Devon Ave Chicago-IL-60563 12/24/2013 1236-D-41632
1236 E Eli Manning 5555 Stranz Lane Dallas-TX-75213 12/24/2013 1236-E-41632
Below is the desired output
ID JOB FNAME LNAME ADDRESS1 ADDRESS2 DATE Concatenated
1234 A John Smith 4378 Anna St Seattle-WA-98040 12/24/2013 1234-A-41632
1234 A John Doe 3564 Lucie Ave Mercer Island-WA-98040 12/24/2013 1234-A-41632
1235 A Alex Smith 4554 Devon Ave Chicago-IL-60563 12/24/2013 1235-A-41632
1235 A Eli Manning 5555 Stranz Lane Dallas-TX-75213 12/24/2013 1235-A-41632
1233 B John Smith 4378 Anna St Seattle-WA-98040 12/24/2013 1233-B-41632
1237 C John Doe 3564 Lucie Ave Mercer Island-WA-98040 12/24/2013 1237-C-41632
1236 D Alex Smith 4554 Devon Ave Chicago-IL-60563 12/24/2013 1236-D-41632
1236 E Eli Manning 5555 Stranz Lane Dallas-TX-75213 12/24/2013 1236-E-41632
The column "concatenated" is where i'm trying to find a change and insert a row after every change.
Any help would be appreciated.
you really want to write a stored procedure run either from you data source into you database or time depended in your database.
here is the code
BEGIN;
ALTER TABLE Test_table ADD COLUMN b1(10,2);
UPDATE test_table SET romney_pct = CAST (romney AS DECIMAL (10,2)) / CAST (uspres_total AS DECIMAL (10,2);
COMMIT;

Returning only records that have matching fields in other records

I have a query that returns a list of customers and their addresses.
ID FName LName Address1 City Postcode
--------------------------------------------------------
1 James Smith 1 Bank Street London W1C 1AA
2 Sarah Jones 45 Moor Ave London SW1 1YH
3 Mary Smith 1 Bank Street London W1C 1AA
4 Sean Baker 17 White Blvd London SE3 7TH
5 Bob Patel 58B Canal St London NW2 2TT
6 Seeta Patel 58B Canal St London NW2 2TT
7 David Hound 4 Main St London E11 8AB
I'm trying to produce another query from this data that selects a list of customers who are related/living together.The criteria for this would be the same Address 1 and Postcode fields.
My question is how I can produce a query that only selects records that have at least 1 other record with matching [Address1] and [Postcode]? ie; in the above example return only records 1, 3, 5 and 6.
Select * From
Customers c JOIN
(SELECT Address1, PostCode FROM Customer GROUP BY Address1, PostCode HAVING Count(1) > 1) c2
ON c.Address1 = c2.Address1 AND c.PostCode = c2.PostCode