MS Access - trying to find duplicates across 4 tables based on Column1 and Column2

MS Access - trying to find duplicates across 4 tables based on Column1 and Column2 - sql

MS Access - trying to find duplicates across 4 tables based on the info in Column1 and Column2. I would also like the resulting query to show me Column3, Column4 and Column5 for easy review. I've tried following a Youtube vid on a union query and was successful.. But that's as far as I can go. I tried to follow along some of the answers but I cant make it work. Just note that I have 0 programming language knowledge. Tyvm in advance!
Column1 = Unique reference
Column 2 = Loss date
Duplicates happen when a row has same unique ref and same DOL. This can be within the table or across tables. Like one entry is in Table2019 and another one is in Table2022. Or two entries in Table2019 with four more spread in other tables.
SELECT [t2019].ID, [t2019].[ClaimNo], [t2019].DOL, [t2019].[Amount], [t2019].[Cause], [t2019].[Ref], [t2019].[Regn], [t2019].Remarks
FROM [t2019]
UNION
SELECT [t2020].ID, [t2020].[ClaimNo], [t2020].DOL, [t2020].[Amount], [t2020].[Cause], [t2020].[Ref], [t2020].[Regn], [t2020].Remarks
FROM [t2020]
UNION
SELECT [t2021].ID, [t2021].[ClaimNo], [t2021].DOL, [t2021].[Amount], [t2021].[Cause], [t2021].[Ref], [t2021].[Regn], [t2021].Remarks
FROM [t2021]
UNION
SELECT [t2022].ID, [t2022].[ClaimNo], [t2022].DOL, [t2022].[Amount], [t2022].[Cause], [t2022].[Ref], [t2022].[Regn], [t2022].Remarks
FROM [t2022];

Access has a wizard to help write the relatively difficult SQL for finding duplicate records. So first gather up all the records that need to be searched for duplicates then use the wizard.
To gather the records open the query designer, go to the SQL Pane, SELECT union and adapt the following SQL:
Unfortunately, there is no graphical interface to help.
Get Typing and don't forget that semi-colon. UNION is used to combine SELECT statements. So were combining everything from all the tables. the ALL is important because by itself UNION ignores rows where every column is an exact match to a previous row. We are looking for duplicates so we add ALL to include those skipped rows.
When you have all the rows go to query wizard under the create tab and run the find duplicates wizard:
Here is the resulting SQL for my example data:
SELECT Query1.[ID], Query1.[DOL], Query1.[ClaimNo], Query1.[Amount], Query1.[Cause], Query1.[Ref], Query1.[Regn], Query1.[Remarks]
FROM Query1
WHERE (((Query1.[ID]) In (SELECT [ID] FROM [Query1] As Tmp GROUP BY [ID],[DOL] HAVING Count(*)>1 And [DOL] = [Query1].[DOL])))
ORDER BY Query1.[ID], Query1.[DOL]
Note:
In Access ID is a primary key and AutoNumber by default. It looks suspicious here. If the default settings are intact and you are entering data in Access then every table starts with ID 1 and you have duplicate ID's in every table. Instead, I would normally combine all these year tables using a year column. This also avoids the union query. I would only use year tables if I had millions of records and couldn't afford the space for a year column.

Related

What does SELECT Function is SQL actually produce? Does it produce a new table by default?

I am struggling to understand what the output of SELECT is meant to be in SQL (I am using MS ACCESS), and what sort of criteria this output needs to specify, if any. As a result, I don't understand why some queries work and others don't. So I know it retrieves data from a table, does calculations with it and displays it. But I don't understand the "inner" working of SELECT function. For instance, what is the name of data structure / entity it displays? Is it a "new" table?
And for example, suppose I have a table called "table_name", with 5 columns. One of the columns called "column_3", and there are 20 records.
SELECT column_3, COUNT(*) AS Count
FROM table_name;
Why does this query fail to run? By logic, I would expect it to display two columns: first column will be "column_3", containing 20 rows with relevant data, and second column will be "Count", containing just one non-empty row (displaying 20), and other 19 rows will be empty (or NULL maybe)?
Is it because SELECT is meant to produce equal number of rows for each column?

Your questions involve a basic understanding of SQL. SELECT statements do not create tables, but instead return virtual result sets. Nothing is persisted unless you change it to an INSERT.
In your example question, you will need to "tell" the SQL engine what you want a count "of". Because you added column_3, you need to write:
SELECT column_3, COUNT(*) AS Count
FROM table_name
GROUP BY column_3
If you wanted a count of all the rows, simply:
SELECT COUNT(*) FROM table_name

SQL: Combine tables and order all rows / Avoid storing lots of null data

Typically in SQL JOINS requires you to join two tables ON a specific column, and then rows get merged. That isn't what I'm looking for.
Is it possible to join tables in SQL in a way that you can ORDER BY columns with the same name, such that x rows are returned, where x = the sum of rows in table 1 and table 2.
To hopefully clarify what I mean, here's an example query:
SELECT * FROM (combined Real and Placeholder items)
ORDER BY StartDate, OnDayIndex
and here's what results might look like:
ID OnDayIndex StartDate ItemType Name TemplatePointer
12308 2 1996-09-18 Real Actual Name Null
10309 11 1996-09-19 Placeholder Null 123
30310 5 1996-09-20 Real Actual Name Null
30410 6 1996-09-20 Placeholder Null 456
My use case is a calendar application with recurring events. To save space, it doesn't make sense to store every recurrence of an event. If it weren't for the particulars of my use case I'd just store a template with a rule and recurring events would be generated when viewed, except for one-off events. The problem is the calendar app I'm working on allows you to move items around in the day they're and saves way you order the items. I'm already using a ranked model gem (link here: https://github.com/mixonic/ranked-model) to cut down on the number of writes needed to update the "onDayIndex". The template approach on its own turns into a bit of a nightmare when "onDayIndex" is factored in... (I could say more...)
So I'd like to store slimmed down 'Placeholder' items that store the items' position and a pointer to template, perhaps in a separate table if possible.
If this isn't possible, an alternative approach I've considered for conserving space is moving most columns from the Items table to an ItemData table, and storing an ItemDataID on Items.
But I'd really like to know if it is possible, as I'm pretty junior in SQL, as well as any other vital information I may be missing.
I'm using Rails with a Postgres database.

Are you talking about using UNION / UNION ALL to stack result sets on top of each other, but where the sources have different columns?
If so, you need to fill in the 'missing' columns (you can only UNION two sets if their signatures match, or can be coerced to match).
For example...
SELECT col1, col2, NULL as col3
FROM tbl1
UNION ALL
SELECT col1, NULL AS col2, col3
FROM tbl2
Note: UNION expends additional resources to remove duplicates from the results. Use UNION ALL if such effort is wasted.

Best way to compare two tables in SQL by matching string?

I have a program where the goal is to take data from an API, and capture the differences in data from minute to minute. It involves three tables: Table 1 (for new data), Table 2 (for previous minutes data), Results table (for the results).
The sequence of the program is like this:
Update table 1 -> Calculate the differences from table 2 and update a "Results" table with the differences -> Copy table 1 to table 2.
Then it repeats! It's simple and it works.
Here is my SQL query:
Insert into Results (symbol, bid, ask, description, Vol_Dif, Price_Dif, Time) Select * FROM(
Select symbol, bid, ask, description, Vol_Dif, Price_Dif, '$now' as Time FROM (
Select t1.symbol, t1.bid, t1.ask, t1.description, (t1.volume - t2.volume) AS Vol_Dif, (t1.totalPrice - t2.totalPrice) AS Price_Dif
FROM `Table_1` t1
Inner Join (
Select id, volume, ask, totalPrice FROM Table_2) t2
ON t2.id = t1.id) as test
The tables are identical in structure, obviously. The primary key is the 'id' field that auto-increments. And as you can see, I am comparing both tables on the basis of these 'id' fields being equal.
The PROBLEM is that the API seems to be inconsistent. One API call will have 50,000 entries. The next one will have 51,000 entries. And the entries are not just added to the end or added to the beginning, they are mixed into the middle.
So, comparing on equal ID's means I am comparing entries for DIFFERENT data, IF the API calls return a different number entries.
The data that I am trying to get the differences of is the 'bid', 'ask', 'Vol_Dif', 'Price_Dif' from minute to minute. There are many instances of the same 'symbol's, so I couldn't compare with this. The ONLY other way to compare entries from table to table, beside the matching ID's, would be matching the "description" fields.
I have tried this. The script is almost the same as above except the end of the query is
ON t2.description = t1.description
The problem is that looking for matching description fields takes 3 minutes for 50,000 entries, whereas looking for matching ID's takes 1 second.
Is there a better, faster way to do what I'm trying to do? Thanks in advance. Any help is appreciated.

SQL Developer: Select query results has the expected row but doesn't display it on the grid

I am using SQL developer 3.2.2 to query an Oracle 12 database. I have a select query where I am expecting a certain row to be picked by the query. The results of this query is moved to a global temp table for further processing. But when I query the newly created temp table for the row mentioned above using its key, the query doesn't find the row.
I initially thought that my query had a problem and it wasn't picking up the row at the first place and was debugging the query. But when I ran the query separately on SQL developer and looked for the row by applying a filter on the key column, it shows the row. But when I sort the key column and manually go look for the row in the grid, I don't see the row. I believe it is the same reason why this particular row isn't copied over to the temp table. This is happening to quite a few rows in the database. Has anyone experienced this problem before?
The query is a simple one and has just two columns UserID and LocationID. The query does a union on multiple sub-queries.
select distinct * from (
SELECT distinct UserID, LocationID
FROM TRANSACTION
WHERE "Deleted" = 0 and "TransactionType" in ('E1513','E1514')
AND "Date" <= '31-DEC-2016'
UNION
SELECT distinct UserID, LocationID
FROM FORMHIS
WHERE "FormID" in ('358465','358455')
AND "Date" <= '31-DEC-2016'
)
The output of the above query is missing few rows that I am sure should be in results.

Get fields from one column to another in Access

Below i have a table where i need to get fields from one column to three columns.
This is how i would like the data to end up
Column1
Music
Column2
com.sec.android.app.music
Column3
com.sec.android.app.music.MusicActionTabActivity

Give the table a numeric autonumber id
Remove the rows with no data with a select where blank spaces or null
Find records with no point in the content with a select
Use the previous query as a source and use the id to find the id + 1 to find the next record and do the same with + 2 to find the second row
Build a table to hold the new structure and use the query as a source to insert the new created data in the new table with the 3 columns structure.
This is an example using sql server
Test table design
Data in table
Query
Look at the query from the inside. The first query inside clean the null records. Then the second query find the records with out point. This records are the reference to find the related two records. Then the id of the records with out point are used to make a query in the select adding 1 for the next record and then other query adding 2 to find the other record. Now you only need to create a table to insert this data, using this query as the source.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas