Compare very different tables - sql

I have two Tables that are read from separate files (.xlsx and .csv) and are imported into MS Access. They are not in the same format
(which is why I'm having such a difficult time with it).
Here is xlsxTable:
+--------------------------------------------------------------------------------------+
| ID | Name | SSN | SSN2 | Address |
+--------------------------------------------------------------------------------------+
| 00012345 | Robert Robin | ThisIsSSN | ThisIsSSN2 | 12345 StreetName St. CityName, KS |
| 00013245 | Pete Peters | ThisIsSSN | ThisIsSSN2 | 54321 StreetName St. CityName, MO |
| 00012358 | Mike Michaels| ThisIsSSN | ThisIsSSN2 | 69874 StreetName St. CityName, NY |
| 00098755 | Tim Timpson | ThisIsSSN | ThisIsSSN2 | 15987 StreetName St. CityName, KY |
| 00035784 | Tom Thompson | ThisIsSSN | ThisIsSSN2 | 95123 StreetName St. CityName, CA |
| 00012584 | Will Willers | ThisIsSSN | ThisIsSSN2 | 35789 StreetName St. CityName, WA |
| ........ | ........... | ......... | .......... | ................................. |
Here is my csvTable:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tracking_number | last_name | first_name | middle_name | suffix | alias_last_name | alias_first_name | alias_middle_name | alias_suffix | number | number_type | dob | street | city | state | zip | country | phone |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 135247 | Keeves | Michael | | Jr | | | | | ThisIsSSN | SSN/ITIN | 1/1/1990 | StreetName | CityName | NJ | | US | |
| 135248 | Jackson | Sue | Master | | | | | | ThisIsSSN | SSN/ITIN | 10/29/1980 | StreetName | CityName | NY | zip | US | |
| 135248 | Thomspon | Dolf | Laundry | | | | | | DriverNum | Driver'sLicense | 11/15/1962 | StreetName | CityName | KS | | US | |
| 135249 | Peters | Pete | | | Peters | Petey | | | ThisIsSSN | SSN/ITIN | 5/6/1975 | StreetName | CityName | PA | zip | US | |
| 135250 | Rogers | Steve | | | | | | | ThisIsSSN | SSN/ITIN | 12/25/1990 | StreetName | CityName | CT | zip | US | |
| 135250 | Nikolson | Jack | | Jr | | | | | DriverNum | Driver'sLicense | 8/5/1975 | StreetName | CityName | CA | zip | US | |
| 135251 | Keeves | Keanu | Neo | | | | | | ThisIsSSN | SSN/ITIN | 10/30/2000 | StreetName | CityName | TX | zip | US | |
| 135252 | Starch | Tony | | | | | | | ThisIsSSN | SSN/ITIN | 9/10/1975 | StreetName | CityName | NJ | | US | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
| dba_name | number | number_type | incorporated | street | city | state | zip | country | phone | | | | | | | | |
| Mini Mart | 92585487 | EIN | | Street | CityName | state | zipNum | GT | | | | | | | | | |
| | 15987548 | EIN | | street | CityName | KS | zipNum | US | | | | | | | | | |
| Check Systems | 35854855 | EIN | | street | CityName | CA | zipNum | US | | | | | | | | | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
Where dba_name is in the above table is an actual row. For some reason, there's another portion of the file that starts a new list.
I have to query these tables and if a name along with SSN match, then I must take the name, address and SSN, and do something with them (most likely put into another table for export). I have loaded both tables from the files necessary.
I'm now needing to iterate through and find the matches. For the sake of the sample data, Pete Peters should match here since the data is in both tables. My expected output should look a lot like the first table:
| ID | Name | SSN | SSN2 | Address |
I currently have an MS Access database that contains these tables. Though, with how the data is parsed, I'm not sure where to even start with the SQL. Performance-wise, this may be extensive. I'm just looking for a way to get it working first.
How can I query these two very different tables and only pull the data that matches?

Access has a find duplicates query wizard. The fastest way to handle the problem is to combine the tables manually or using 1 or more queries and then run the wizard. Again, get all your data into one table and then run the wizard. To make things complicated by breaking them down.
you might get the data from the CSV Table: with a query like:
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
HAVING (((Count(csvTable.Number))>1));
then create a query with the same structure from the xlsx table:
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN
HAVING (((Count(xlsxTable.SSN))>1));
The having Count >1 does the work of finding the duplicates. Most of the rest of this is obtuse string manipulations to turn Full Name into first and last name directly in the sql. Then combine the queries so you can run them at the same time in the sql pane using a UNION ALL statement:
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
UNION ALL
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN;
union all keeps duplicates while union omits them. I have removed the having statements from the union as I find it works better. next use the find duplicates wizard on your combined query like:
SELECT [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
FROM [combine tables]
GROUP BY [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
HAVING (((Count([combine tables].Number))>1));

Related

MS Access: merging different data from different tables about same set of users into one table

I am tinkering with an old dataset. It consists of old logfiles of an electronic messageboard of a small club. It has a great nostalgic value for a friend of mine (who was a member of this community), who asked me now to make some sense of it.
Now, I have several tables in my database, which are all some sort of logs (messageboard logins, messages etc). Funny thing is that there seem to be correct dates, but no usernames or user ID-s, but real names and surnames instead. (Many of these are most likely nicknames, but it doesn't matter).
On my friend's request I want to "merge" all logfiles chronologically into one long table, which should eventually give him and his mates an overview of this club's activities.
So I will treat family names and first names together as an user identifier and I have created unified value 'date' for each table (for logins it's a copy of login date, for logouts it's a copy of logout date, for posted messages it's a copy of post date).
Now I would like to put data about users' actions into one table. It should be chronological (sorted by date). Is it possible to achieve it with an MS Access SQL query?
I have tables like these:
logins:
+---------+------+---------------------+---------------------+
| surname | name | login_date | date |
+---------+------+---------------------+---------------------+
| Smith | John | 1997-01-14_18:45:04 | 1997-01-14_18:45:04 |
| Parker | Mary | 1997-03-15_11:30:45 | 1997-03-15_11:30:45 |
| Smith | John | 1997-03-20_09:05:24 | 1997-03-20_09:05:24 |
+---------+------+---------------------+---------------------+
logouts:
+---------+------+---------------------+---------------------+
| surname | name | logout_date | date |
+---------+------+---------------------+---------------------+
| Smith | John | 1997-01-14_19:25:55 | 1997-01-14_19:25:55 |
| Parker | Mary | 1997-03-15_13:08:01 | 1997-03-15_13:08:01 |
| Smith | John | 1997-03-20_09:15:58 | 1997-03-20_09:15:58 |
+---------+------+---------------------+---------------------+
posted_messages:
+---------+------+---------------------+---------------+---------------------+
| surname | name | post_date | post_text | date |
+---------+------+---------------------+---------------+---------------------+
| Parker | Mary | 1997-03-15_12:30:56 | "Hello world" | 1997-03-15_12:30:56 |
| Smith | John | 1997-03-20_09:14:01 | "Hello hello" | 1997-03-20_09:14:01 |
+---------+------+---------------------+---------------+---------------------+
And my desired outcome would be something like:
+---------------------+---------+------+---------------------+---------------------+---------------------+---------------+
| date | surname | name | login_date | logout_date | post_date | post_text |
+---------------------+---------+------+---------------------+---------------------+---------------------+---------------+
| 1997-01-14_18:45:04 | Smith | John | 1997-01-14_18:45:04 | | | |
| 1997-01-14_19:25:55 | Smith | John | | 1997-01-14_19:25:55 | | |
| 1997-03-15_11:30:45 | Parker | Mary | 1997-03-15_11:30:45 | | | |
| 1997-03-15_12:30:56 | Parker | Mary | | | 1997-03-15_12:30:56 | "Hello world" |
| 1997-03-15_13:08:01 | Parker | Mary | | 1997-03-15_13:08:01 | | |
| 1997-03-20_09:05:24 | Smith | John | 1997-03-20_09:05:24 | | | |
| 1997-03-20_09:14:01 | Smith | John | | | 1997-03-20_09:14:01 | "Hello hello" |
| 1997-03-20_09:15:58 | Smith | John | | 1997-03-20_09:15:58 | | |
+---------------------+---------+------+---------------------+---------------------+---------------------+---------------+
You want union all:
select date, surname, name, login_date, null as logout_date, null as post_date, null as post_text
from logins
union all
select date, surname, name, null, logout_date, null, null
from logouts
union all
select date, surname, name, null, null, post_date, post_text
from posted_messages;
You can either create a new table using select into, insert into an existing table using insert, or just create a view.

SQL query to find a partial string match that could include special characters

SQL query with special character ()
The original query (big thanks to GMB) can find any items in address (users table) that have a match in address (address_effect table).
The query works fine if address contains ',' but I can't seem to make it work if there is '()' in the address field.
Here is the sql query that's not working:
UPDATE users u
SET u.COUNT = (
SELECT COUNT(*) FROM address_effect a
WHERE FIND_IN_SET(a.address, REPLACE(u.address, ', ', ','')'))
)
Fyi, I'm testing this on my local system with XAMPP (using MariaDB).
I tried to identify '()' as an escape character by prepending it with backslash '' but it doesn't help.
user table
+--------+-------------+---------------+--------------------------+--------+
| ID | firstname | lastname | address | count |
| | | | | |
+--------------------------------------------------------------------------+
| 1 | john | doe |james street, idaho, usa | |
| | | | | |
+--------------------------------------------------------------------------+
| 2 | cindy | smith |rollingwood av,lyn, canada| |
| | | | | |
+--------------------------------------------------------------------------+
| 3 | rita | chatsworth |arajo ct, alameda, cali | |
| | | | | |
+--------------------------------------------------------------------------+
| 4 | randy | plies |smith spring, lima, (peru)| |
| | | | | |
+--------------------------------------------------------------------------+
| 5 | Matt | gwalio |park lane, (atlanta), usa | |
| | | | | |
+--------------------------------------------------------------------------+
address_effect table
+---------+----------------+
|address |effect |
+---------+----------------+
|idaho |potato, tater |
+--------------------------+
|canada |cold, tundra |
+--------------------------+
|fremont | crowded |
+--------------------------+
|peru |alpaca |
+--------------------------+
|atlanta |peach, cnn |
+--------------------------+
|usa |big, hard |
+--------+-----------------+
I would suggest using regular expressions for this. It seems more general than fiddling with the string:
update users u
set count = (select count(*)
from address_effect ae
where u.address regexp concat('[[:<:]]', ae.address, '[[:>:]]'))
);
The funky character class is MySQL's way of delineating a word boundary (I am more used to \W but MySQL doesn't support that).
Here is a db<>fiddle.
Just like you replace the space after each comma with just a comma, use REPLACE() to remove the chars '(' and ')':
FIND_IN_SET(a.address, REPLACE(REPLACE(REPLACE(u.address, ', ', ','), '(', ''), ')', ''))
See the demo.
Results:
| ID | firstname | lastname | address | count |
| --- | --------- | ---------- | -------------------------- | ----- |
| 1 | john | doe | james street, idaho, usa | 2 |
| 2 | cindy | smith | rollingwood av,lyn, canada | 1 |
| 3 | rita | chatsworth | arajo ct, alameda, cali | 0 |
| 4 | randy | plies | smith spring, lima, (peru) | 1 |
| 5 | Matt | gwalio | park lane, (atlanta), usa | 2 |

Updating table based on the results of previous query

How can I update the table based on the results of the previous query?
The original query (big thanks to GMB) can find any items in address (users table) that have a match in address (address_effect table).
From the result of this query, I want to find the count of address in the address_effect table and add it into a new column in the table “users”. For example, john doe has a match with idaho and usa in the address column so it’ll show a count of ‘2’ in the count column.
Fyi, I'm testing this on my local system with XAMPP (using MariaDB).
user table
+--------+-------------+---------------+--------------------------+--------+
| ID | firstname | lastname | address | count |
| | | | | |
+--------------------------------------------------------------------------+
| 1 | john | doe |james street, idaho, usa | |
| | | | | |
+--------------------------------------------------------------------------+
| 2 | cindy | smith |rollingwood av,lyn, canada| |
| | | | | |
+--------------------------------------------------------------------------+
| 3 | rita | chatsworth |arajo ct, alameda, cali | |
| | | | | |
+--------------------------------------------------------------------------+
| 4 | randy | plies |smith spring, lima, peru | |
| | | | | |
+--------------------------------------------------------------------------+
| 5 | Matt | gwalio |park lane, atlanta, usa | |
| | | | | |
+--------------------------------------------------------------------------+
address_effect table
+---------+----------------+
|address |effect |
+---------+----------------+
|idaho |potato, tater |
+--------------------------+
|canada |cold, tundra |
+--------------------------+
|fremont | crowded |
+--------------------------+
|peru |alpaca |
+--------------------------+
|atlanta |peach, cnn |
+--------------------------+
|usa |big, hard |
+--------+-----------------+
Use a correlated subquery which returns the number of matches:
UPDATE user u
SET u.count = (
SELECT COUNT(*)
FROM address_effect a
WHERE FIND_IN_SET(a.address, REPLACE(u.address, ', ', ','))
)
See the demo.
Results:
> ID | firstname | lastname | address | count
> -: | :-------- | :--------- | :------------------------- | ----:
> 1 | john | doe | james street, idaho, usa | 2
> 2 | cindy | smith | rollingwood av,lyn, canada | 1
> 3 | rita | chatsworth | arajo ct, alameda, cali | 0
> 4 | randy | plies | smith spring, lima, peru | 1
> 5 | Matt | gwalio | park lane, atlanta, usa | 2
Notice: I checked it in MySQL, but not in MariaDB.
The count column of users table may be able to be updated using UPDATE statement with INNER JOIN. Then you can use a query that modifies the original query to use "GROUP BY".
UPDATE users AS u
INNER JOIN
(
-- your original query modified
SELECT u.ID AS ID, count(u.ID) AS count
FROM users u
INNER JOIN address_effect a
ON FIND_IN_SET(a.address, REPLACE(u.address, ', ', ','))
GROUP BY u.ID
) AS c ON u.ID=c.ID
SET u.count=c.count;

Splitting a table on comma separated emails in Big Query

I have a table with following columns (The email address are comma separated):
+---------+----------+------------+---------------------------------------------+---------+
| Sr. No. | Domain | Store Name | Email | Country |
+---------+----------+------------+---------------------------------------------+---------+
| 1. | kkp.com | KKP | den#kkp.com, info#kkp.com, reno#kkp.com | US |
| 2. | lln.com | LLN | silo#lln.com | UK |
| 3. | ddr.com | DDR | info#ddr.com, dave#ddr.com | US |
| 4. | cpp.com | CPP | hello#ccp.com, info#ccp.com, stelo#ccp.com | CN |
+---------+----------+------------+---------------------------------------------+---------+
I want the output with Email in separate Columns:
+---------+----------+------------+---------------+---------------+---------------+---------+---------+
| Sr. No. | Domain | Store Name | Email 1 | Email 2 | Email 3 | Email N | Country |
|---------+----------+------------+---------------+---------------+---------------+---------+---------+
| 1. | kkp.com | KKP | den#kkp.com | info#kkp.com | reno#kkp.com | ....... | US |
| 2. | lln.com | LLN | silo#lln.com | | | ....... | UK |
| 3. | ddr.com | DDR | info#ddr.com | dave#ddr.com | | ....... | US |
| 4. | cpp.com | CPP | hello#ccp.com | info#ccp.com | stelo#ccp.com | ....... | CN |
+---------+----------+------------+---------------+---------------+---------------+---------+---------+
Can someone please help a beginner in SQL and BigQuery.

How to execute this query to compare dates

Write a query to display the students who are older than 'Balakrishnan'. Sort the results based on firstname in ascending order.
The output should look like this
+--------+-----------+----------+-------------+------------+-----------+
| STUDID | FIRSTNAME | LASTNAME | STREET | CITY | DOB |
+--------+-----------+----------+-------------+------------+-----------+
| 3009 | Abdul | Rahman | HAL | Bangalore | 19-JAN-88 |
| 3002 | Anand | Kumar | Indiranagar | Bangalore | 19-JAN-88 |
| 3001 | Dileep | Kumar | Jai Nagar | Bangalore | 10-MAR-89 |
| 3004 | Gowri | Shankar | Gandhipuram | Coimbatore | 22-DEC-87 |
| 3008 | John | Dravid | Mylapore | Chennai | 15-SEP-87 |
| 3006 | Prem | Kumar | Ramnagar | Coimbatore | 17-MAY-87 |
| 3007 | Rahul | Dravid | KKNagar | Chennai | 08-OCT-87 |
+--------+-----------+----------+-------------+------------+-----------+
Try this:-
It may be help you.
SELECT * FROM TABLE_NAME
WHERE DOB < TO_DATE('DOB_of_Balakrishnan','DD-MM-YYYY')
ORDER BY FIRSTNAME;
I am using oracle 11g.
As I see that the DOB of Balakrishnan is not provided...
try using this:
SELECT *
FROM table_name
WHERE dob<(SELECT dob
FROM table_name
WHERE LOWER(firstname)='bala')
ORDER BY firstname;