I have data in an Excel spreadsheet that I import into Access 2007. There is a candidate key (CN). For those rows with the same CN, the data is different for all columns. Example below (real data has 100 columns and MsgNum might vary more often, haven't confirmed this pattern with other instances yet, so although I tried to select on it, the solution should probably ignore that the combination of CN and MsgNum could be unique).
Date | CN | MsgNum
2012-01-03 111-111-1111 101
2012-01-04 222-222-2222 101
2012-01-05 222-222-2222 202
2012-01-05 333-333-3333 101
2012-01-05 333-333-3333 202
2012-01-04 444-444-4444 101
2012-01-04 444-444-4444 101
I do not have access to SQL Server. All I have is Access 2007. I don't want to use Excel's remove duplicates procedure because the data that gets to me comes from Access before being exported to Excel, so I'm trying to find a solution to remove the duplicates through Access.
Using SQL in Query Design in Access, I have tried using a subquery in the WHERE clause that groups by the CN keeping those with a count of 1, but that removes all instances and does not keep at least one.
I tried selecting just two columns (CN and MIN(MsgNum))--grouping appropriately--and that gives me what I want, but when I run it with all the columns specified (100 columns in all), I get duplicates still.
I tried the Query Wizard Find Duplicates for a single column and to return the rest of the columns, that works in getting the duplicates isolated in a view. Since I cannot setup any primary keys, I am not sure how to join the tables. When running the previous MIN query with all columns, I get the same problem as before.
I was trying to setup something in the WHERE clause that compared the combination of two columns, but I read that that cannot be done. So, I am at a lost on how to solve this issue where there is a candidate key but the information in the records for the duplicates on this candidate key column are different. What I want to have done is what Excel 2007's Remove Duplicates procedure does where duplicates on one column can be removed and retain others.
Related
I have a table that looks like this:
Sku_Code Channel Rank Category Website Date
123 US 28 Toys www.foo.com 2021-06-07
123 US 13 Games www.lolo.com 2021-06-07
328 CA 12 Toys www.lo.com 2021-05-12
123 US 2 Games www.foo.com 2021-06-05
I would like to pivot this table so all ID information falls in one row...like this:
Sku_Code Channel Category Category_1 Website Website_1 Date
123 US Toys Games www.foo.com www.lolo.com 2021-06-07
328 CA Toys www.lo.com 2021-05-12
123 US Games www.foo.com 2021-06-05
It's a fairly large table so wondering whats the best/fastest way to do this? I know there is a pivot function I could use but do not now how to apply it in this situation.
I'm fairly new to SQL so any help would be appreciated.
It's not clear whether an ID can have an arbitrary number of categories. If so, this is not possible in normal SQL alone. The SQL language has a strict requirement for the number and types of columns to be known at query compile time, before looking at any data.
That doesn't mean what you want to do can't happen at all... just that the solution will be more involved. For example, you may need to do the pivot in your reporting tool or client code. The other alternative is dynamic sql over three steps: First, run a query to determine how many categories you will need. Second, use the information from step one to build a new SQL statement on the fly, probably involving an additional join back to the same table for each category, using the PIVOT keyword, or both. Finally, execute the SQL statement built in step two.
I have two datasets, one in SSMS and one in Oracle I'm trying to combine through SSRS. Technically I have two questions regarding the results I'm getting.
Dataset 1:
DataSet1 - Sales - MS
Part Location Transaction_date QTY_SOLD
1234 New York 06/01/2017 1
1235 New York 06/01/2017 4
Dataset 2 - Returns - Oracle
Part Location Purchase_Date QTY_RTN Reason
1235 New york 06/01/2017 2 Broken`
What I'm wanting to get:
Part Location Date QTY_SOLD QTY_RTN Reason
1234 New York 06/01/2017 1 NULL NULL
1235 New York 06/01/2017 4 2 Broken
I have lookup expressions set to join on part, location, date for qty_rtn and reason columns.
Part one, 1234 with no returns does not show up. The first dataset should return ~1400 items. The second dataset should return the same theoretically, but since that info is manually entered the purchase_date does not always match the transaction_date (this is fine. half the purpose of this is to find those mistakes and get someone to go back and correct the data). When I run the query, I get ~400 items.
Part two, when I do a preview from within Studio, the MS and Oracle data shows up. When I pull from the web interface, only the MS data shows up. I've checked that the credentials on both sides are correct and have the correct connection strings as well.
Any thoughts are appreciated.
Not sure what was broken, but I wound up deleting and re-creating the report from scratch and it works with all the data as it's supposed to. Web interface is also not missing data.
When I fill my DGV from a bound Access Table, it orders by Primary ID to a point - I say that because I have 1200 records and the DGV is filled in the following order, assuming we are just looking at the Prim. ID column -
1200
1199
1198
1197
1196
1
2
3
4
.....
1195
using the code below -
Me.ClientListTableAdapter.Fill(Me.ReportGenieDatabaseDataSet2.ClientList)
Sorry if this is vague but it's all I got. I hope to be able to order by descending ID - just like the Access table shows.
Also when I look up the .Last value it reads the "1195"
There is no order in a table. You can often find from one query to the next, records will be returned in different orders. You MUST thus use a SQL statement to set the order of the data returned. It is likely now you are using some SQL statement to retrieve this data – simply all a ORDER BY clause to that sql.
Using Access 2010 and its version of SQL, I am trying to find a way to relate two tables in a query where I do not have strict, unique values in each table, using concatenated fields that are mostly unique, then matching each unmatched next record (measured by a date field or the record id) in each table.
My business receives checks that we do not cash ourselves, but rather forward to a client for processing. I am trying to build a query that will match the checks that we forward to the client with a static report that we receive from the client indicating when checks were cashed. I have no control over what the client reports back to us.
When we receive a check, we record the name of the payor, the date that we received the check, the client's account number, the amount of the check, and some other details in a table called "Checks". We add a matching field which comes as close as we can get to a unique identifier to match against the client reports (more on that in a minute).
Checks:
ID Name Acct Amt Our_Date Match
__ ____ ____ ____ _____ ______
1 Dave 1001 10.51 2/14/14 1001*10.51
2 Joe 1002 12.14 2/28/14 1002*12.14
3 Sam 1003 50.00 3/01/14 1003*50.00
4 Sam 1003 50.00 4/01/14 1003*50.00
5 Sam 1003 50.00 5/01/14 1003*50.00
The client does not report back to us the date that WE received the check, the check number, or anything else useful for making unique matches. They report the name, account number, amount, and the date of deposit. The client's report comes weekly. We take that weekly report and append the records to make a second table out of it.
Return:
ID Name Acct Amt Their_Date Unique1
__ ____ ____ ____ _____ ______
355 Dave 1001 10.51 3/25/14 1001*10.51
378 Joe 1002 12.14 4/04/14 1002*12.14
433 Sam 1003 50.00 3/08/14 1003*50.00
599 Sam 1003 50.00 5/11/14 1003*50.00
Instead of giving us back the date we received the check, we get back the date that they processed it. There is no way to make a rule to compare the two dates, because the deposit dates vary wildly. So the closest thing I can get for a unique identifier is a concatenated field of the account number and the amount.
I am trying to match the records on these two tables so that I know when the checks we forward get deposited. If I do a simple join using the two concatenated fields, it works most of the time, but we run into a problem with payors like Sam, above, who is making regular monthly payments of the same amount. In a simple join, if one of Sam's payments appears in the Return table, it matches to all of the records in the Checks table.
To limit that behavior and match the first Sam entry on the Return table to the first Sam entry on the Checks table, I wrote the following query:
SELECT return.*, checks.*
FROM return, checks
WHERE (( ( checks.id ) = (SELECT TOP 1 id
FROM checks
WHERE match = return.unique1
ORDER BY [our_date]) ));
This works when there is only one of Sam's records in the Return table. The problem comes when the second entry for Sam hits the Return table (Return.ID 599) as the client's weekly reports are added to the table. When that happens, the query appropriately (for my purposes) only lists that two of Sam's checks have been processed, but uses the "Top 1 ID" record to supply the row's details from the Return table:
Checks_Return_query:
Checks.ID Name Acct Amt Our_Date Their_Date Return.ID
__ ____ ____ ____ _____ ______ ________
1 Dave 1001 10.51 2/14/14 3/25/14 355
2 Joe 1002 12.14 2/28/14 4/04/14 378
3 Sam 1003 50.00 3/01/14 3/08/14 433
4 Sam 1003 50.00 4/01/14 3/08/14 433
In other words, the query repeats the Return table info for record Return.ID 433 instead of matching Return.ID 599, which is I guess what I should expect from the TOP 1 operator.
So I am trying to figure out how I can get the query to take the two concatenated fields in Checks and Return, compare them to find matching sets, then select the next unmatched record in Checks (with "next" being measured either by the ID or Our_Date) with the next unmatched record in Return (again, with "next" being measured either by the ID or Their_Date).
I spent many hours in a dark room turning the query into various joins, and back again, looking at functions like WHERE NOT IN, WHERE NOT EXISTS, FIRST() NEXT() MIN() MAX(). I am afraid I am way over my head.
I am beginning to think that I may have a structural problem, and may need to write the "matched" records in this query to another table of completed transactions, so that I can differentiate between "matched" and "unmatched" records better. But that still wouldn't help me if two of Sam's transactions are on the same weekly report I get from my client.
Are there any suggestions as to query functions I should look into for further research, or confirmation that I am barking up the wrong tree?
Thanks in advance.
I'd say that you really need another table of completed transactions, it could be temporary table.
Regarding your fears "... if two of Sam's transactions are on the same weekly report ", you can use cursor in order to write records "one-by-one" instead of set based transaction.
Short background: I have an SQLite database, a couple of gb in size and growing. It contains a bunch of very simple tables. Each table consists of a 64-bit integer primary index field (TStamp) and a value field (Val). The field TStamp is actually an long-int representation of a date-time. The tables have widely varying row-counts and somewhat variable content types, but that shouldn't matter. A master table (tbIndDate) holds a full range of dates, has the same primary index (TStamp) as the other tables, and holds human-readable date-time in its Val field. For instance,
The master index table, named tbIndDate:
TStamp Val
634082688000000000 5/1/2010 0:00:00
634082691000000000 5/1/2010 0:05:00
634082694000000000 5/1/2010 0:10:00
634082697000000000 5/1/2010 0:15:00
etc etc
A sample table for automation tag 6FI1.PV, named tb6FI1%PV:
TStamp Val
634085793000000000 41.7
634085796000000000 42.83
634085799000000000 41.44
634085802000000000 40.43
634085805000000000 39.78
etc etc
Getting data into the tables is handled by a little vb.net program, and when a new automation tag is added to the capture list then the program creates a new table using the automation tag name, and begins populating it. That all works real slick.
OK. I've started building a tool for getting data out of the database. It works great for inner joins:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV',[tb6AI32%PV].[Val] AS '6AI32.PV'
FROM [tbIndDate],[tb6FI1%PV],[tb6FI34%PV],[tb6AI32%PV]
WHERE [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
AND [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp]
AND [tbIndDate].[TStamp]=[tb6AI32%PV].[TStamp];
This returns:
Timestamp 6FI1.PV 6FI34.PV 6AI32.PV
1/1/2013 0:00:00 42.4679 1.499 0.8439
1/1/2013 0:05:00 40.3628 1.5048 0.8435
1/1/2013 0:10:00 38.2652 1.5028 0.8436
1/1/2013 0:15:00 37.8582 1.5029 0.8436
Yay! :)
I've also gotten some averaging and time-interval queries working.
However, because tag data is not all available for all dates, I'd like to create the option to list all dates in the master index even if some of the tag tables do not have matching data.
A SELECT query with a left outer join, in other words. Everyone knows that. The data might look like:
Timestamp 6FI1.PV 6FI34.PV 6AI32.PV
1/1/2013 0:00:00 42.4679 1.499 NULL
1/1/2013 0:05:00 40.3628 1.5048 NULL
1/1/2013 0:10:00 38.2652 NULL NULL
1/1/2013 0:15:00 37.8582 NULL 0.8436
Trouble is, none of the SQL I've tried has worked. Here's one that didn't go:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV'
FROM [tbIndDate],[tb6FI1%PV],[tb6FI34%PV]
LEFT JOIN [tbIndDate] ON [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
LEFT JOIN [tbIndDate] ON [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp];
The error was "SQL error or missing database, ambiguous column name: tbIndDate.Val"
I've tried copying the syntax from several examples, but none are exactly the same and my attempts fail.
Am I doing the aliases wrong? The square brackets to accommodate special characters in table names? I'm a complete SQL beginner, so don't hold back with the advice.
It looks like the problem is that you're trying to join [tbIndDate] several times. Try this:
SELECT [tbIndDate].[Val] AS 'Timestamp',[tb6FI1%PV].[Val] AS '6FI1.PV',
[tb6FI34%PV].[Val] AS '6FI34.PV'
FROM [tbIndDate]
LEFT JOIN [tb6FI1%PV] ON [tbIndDate].[TStamp]=[tb6FI1%PV].[TStamp]
LEFT JOIN [tb6FI34%PV] ON [tbIndDate].[TStamp]=[tb6FI34%PV].[TStamp];