Using top clause in Access sub report - sql

I'm making a report in Access 2003 that contains a sub report of related records. Within the sub report, I want the top two records only. When I add "TOP 2" to the sub report's query, it seems to select the top two records before it filters on the link fields. How do I get the top two records of only those records that apply to the corresponding link field? Thanks.

The sample query below is supposed to return a pair of most recent orders for each customer (instead of all orders):
select
Order.ID,
Order.Customer_ID,
Order.PlacementDate
from
Order
where
Order.ID in
(
select top 2
RecentOrder.ID
from
Order as RecentOrder
where
RecentOrder.Customer_ID = Order.Customer_ID
order by
RecentOrder.PlacementDate Desc
)
A query like this could be used in your sub-report to avoid using a temporary table.
CAVEAT EMPTOR: I did not test this sample query, and I don't know if this query would work for a report running against Jet database (we don't use Access to store data and we avoid Access reports like plague :-). But it should against SQL Server.
I also don't know how well it would perform in your case. As usual, it depends. :-)
BTW, speaking of performance and hacks. I would not consider usage of temporary table a hack. At worst, this trick can be considered as a more-complicated-than-necessary interface to the report. :-) And using such temporary table may actually be one of the good ways to improve performance. So, don't hurry writing it off. :-)

I've got two suggestions:
1) Pass your master field (on the parent form) to the query as a parameter (you could reference a field on the parent form directly as well)
2) You could fake out rownumbers in Access and limit them to only rownum <= 2. E.g.,
SELECT o1.order_number, o1.order_date,
(SELECT COUNT(*) FROM orders AS o2
WHERE o2.order_date <= o1.order_date) AS RowNum
FROM
orders AS o1
ORDER BY o1.order_date
(from http://groups.google.com/group/microsoft.public.access.queries/msg/ec562cbc51f03b6e?pli=1)
However, this kind of query might return an read only record set, so it might not be appropriated if you needed to do the same thing on a Form instead of a Report.

Related

Error: TABLE_QUERY expressions cannot query BigQuery tables

This s a followup question regarding Jordans answer here: Weird error in BigQuery
I was using to query reference table within "Table_Query" for quit some time. Now, following the recent changes Joradan is referring to, many of our queries are broken... I would like to ask the community advice for alternative solution to what we are doing.
I have tables containing events ("MyTable_YYYYMMDD"). I want to query my data for a period of a specific (or several) campaign. The period of that campaign is stored in a table with all campaigns data (ID, StartCampaignDate, EndCampaignDate). In order to query only the relevant tables, we use Table_Query(), and within the TableQuery() we construct a list of all relevant table names based on the campaigns data.
This query runs in various forms many times with different params. the reason for using wildcard function (rather than query the entire dataset), is performance, execution costs, and maintenance costs. So, having it query all tables and filter just the results is not an option as it drives execution costs too high.
a sample query will look like:
SELECT
*
FROM
TABLE_QUERY([MyProject:MyDataSet] 'table_id IN
(SELECT CONCAT("MyTable_",STRING(Year*100+Month)) TBL_NAME
FROM DWH.Dim_Periods P
CROSS JOIN DWH.Campaigns AS LC
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6")
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate))')
This is now broken...
My question - the info, which tables should you query is stored on a reference table, How would you query only the relevant tables (partitions) when "TableQuery" is no longer allowed to query reference tables?
Many thanks
The "simple" way I see is split it to two steps
Step 1 - build list that will be used to filter table_id's
SELECT GROUP_CONCAT_UNQUOTED(
CONCAT('"',"MyTable_",STRING(Year*100+Month),'"')
) TBL_NAME_LIST
FROM DWH.Dim_Periods P
CROSS JOIN DWH.Campaigns AS LC
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6")
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate)
Note the change in your query to transform result to list that you will use in step 2
Step 2 - final query
SELECT
*
FROM
TABLE_QUERY([MyProject:MyDataSet],
'table_id IN (<paste list (TBL_NAME_LIST) built in first query>)')
Above steps are easy to implement in any client you potentially using
If you use it from within BigQuery Web UI - this makes you do a little extra manual "moves" that you might not be happy about
My answer is obvious and you most likely have this already as an option, but wanted to mention
This is not ideal solution. But it seems to do the job.
In my previous query I passed the IDs List as a parameter in an external process that constructed the query. I wanted this process to be unaware to any logic implemented in the query.
Eventually we came up with this solution:
Instead of passing a list of IDs, we pass a JSON that contains the relevant meta data for each ID. We parse this JSON within the Table_Query() function. So instead of querying a physical reference table, we query some sort of a "table variable" that we have put in a JSON.
Below is a sample query that runs on the public dataset that demonstrates this solution.
SELECT
YEAR,
COUNT (*) CNT
FROM
TABLE_QUERY([fh-bigquery:weather_gsod], 'table_id in
(Select table_id
From
(Select table_id,concat(Right(table_id,4),"0101") as TBL_Date from [fh-bigquery:weather_gsod.__TABLES_SUMMARY__]
where table_id Contains "gsod"
)TBLs
CROSS JOIN
(select
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"fromDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as fromDate,
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"toDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as toDate,
FROM
(Select
"[
{
\"CycleID\":\"123456\",
\"fromDate\":\"1929-01-01\",
\"toDate\":\"1950-01-10\"
},{
\"CycleID\":\"123456\",
\"fromDate\":\"1970-02-01\",
\"toDate\":\"2000-02-10\"
}
]"
as DatesInput)) RefDates
WHERE TBLs.TBL_Date>=RefDates.fromDate
AND TBLs.TBL_Date<=RefDates.toDate
)')
GROUP BY
YEAR
ORDER BY
YEAR
This solution is not ideal as it requires an external process to be aware of the data stored in the reference tables.
Ideally the BigQuery team will re-enable this very useful functionality.

MS Access 2010 SQL Top N query by group performance issue (continued)

I have signficant performcance issues (up to time-out) in MS Access 2010 with the query below. The table TempTableAnalysis contains between 10'000-15'000 records. I have already received input from this forum to work with a temporary table in the top 10 query (MS Access 2010 SQL Top N query by group performance issue)
Can anyone explain how to implement the temporary table in the subquery and how to join it? I can't get it to work.
Any other suggestions to improve performance are highly appreciated.
Here is my query:
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
t2.DmdUnit,
ROUND(t2.MASE,2) AS MASE,
ROUND(t2.AFAR,2) AS AFAR
FROM TempTableAnalysis AS t2
WHERE t2.MASE IN (
SELECT TOP 10 t1.MASE
FROM TempTableAnalysis AS t1
WHERE t1.ABCByPick = t2.ABCByPick
ORDER BY t1.MASE DESC
)
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
Optimizing Access Query Performance For Large Data Sets
Based on your posted SQL Query, you have some options available to optimize and speed up the performance.
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
t2.DmdUnit,
ROUND(t2.MASE,2) AS MASE,
ROUND(t2.AFAR,2) AS AFAR
FROM TempTableAnalysis AS t2
...
This is the first part where TempTableAnalysis is the multi-thousand record subquery. If you want to squeeze a little more performance out of the use of this "Temp" Table, don't use it as a dynamic query (i.e., calculated on demand each time the query is opened), try constructing a macro that pushes the output to a static table:
Appending Subquery Data to a Static Table:
Create a QUERY object and change its type to DELETE. Design it to delete the contents of your "temporary" table object. If you prefer using SQL, the command will look like:
DELETE My_Table.*
FROM My_Table;
Create a QUERY object and change its type to APPEND. Design it to query all fields from your query defined by the SQL statement of this OP. Again, the SQL version of this task has the following syntax:
INSERT INTO StaticAnalysisTable ( ID, Loc, Item, AvgOfScaledError )
SELECT t1.ID, t1.Loc, t1.Item, t1.AvgOfScaledError
FROM TempTableAnalysis as t1;
The next step is to automate the population of this static table and it is optional. It's simple however and will make it less likely that you will make the mistake of forgetting to "Refresh" and accessing your static table while it has stale data... causing inaccuracies in your results.
Create a macro with two steps. Each step will have the following definition: OPEN QUERY. When prompted for the query to open, reference the objects you created in the previous two steps in the following order (important): (1) DELETE Query: (your delete query name) then (2) APPEND Query: (your append query name).
SQL Query Comments and Suggestions
The following part of the posted SQL query could use some help:
...
WHERE t2.MASE IN (
SELECT TOP 10 t1.MASE
FROM TempTableAnalysis AS t1
WHERE t1.ABCByPick = t2.ABCByPick
ORDER BY t1.MASE DESC
)
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
There is a join across the sub query that generates the TOP-10 data and the outermost query that correlates these results with the supplementing MASE table data. This isn't necessary if the TempTableAnalysis.MASE represents a key value.
ORDER BY
in the inner most query isn't necessary unless it is intended to force some sort of selection criteria (as in when using SQL analytical functions) this doesn't look like one of those cases. Ordering records from large data sets is also a wasteful cpu and memory sink.
EDIT: Just as a counter-point argument, the ORDER BY clause used beside a TOP N query actually has a purpose, but I am still not clear if it is necessary. Just to round out the discussion, another SO thread talks about How to Select Top 10 in an Access Query.
WHERE t2.MASE IN (...
You may be experiencing blocks in performance with very large in-list set operations. On an Oracle database server, I have discovered with other developers that there is a limitation to the number of discrete elements in an in-list query operator. That value was in the thousands... which may be further limited based on server and database resources.
Consider using a SQL JOIN operator. The place where you define TABLE objects can also be populated with SQL defined queries with aliases known as INLINE VIEWS. Since you're using ACCESS, if an inline view does not work directly, just define another ACCESS QUERY object and reference it in your final query as if it were a table...
A possible rewrite to the ending part of the original query:
SELECT
t2.Loc,
t2.ABCByPick,
t2.Planner,
...
FROM TempTableAnalysis AS t2,
(SELECT TOP 10 t1.MASE, t1.ABCByPick
FROM TempTableAnalysis AS t1) AS ttop
WHERE t2.MASE = ttop.MASE
AND t2.ABCByPick = ttop.ABCByPick
ORDER BY
t2.ABCByPick,
t2.MASE DESC;
You will definitely need to run through these recommendations and validate the output data for accuracy. This represents approaches to capturing some of the "low-hanging fruit" (easy items) that you can pursue to speed up your query and reporting operations.
Conclusions and Closing Comments
As a background to other readers, the database object TempTableAnalysis is not a static table. It is the result of a sub query presented in another SO post requesting help with a Access TOP N Query. The query comes from multiple tables approaching 10,000 records in size (each?).
Tip: A query result in Access ALSO has potential table-like behaviors. You can push the output to a table for joining (as described above) or just join to the query object itself (careful though, especially when you get to "chaining" multiple query operations...)
The strategy of this solution was:
To minimize the number of trips through one or more instances of this very large table.
To pre-process and index optimize any data that would otherwise be "static" for the duration of its analysis.
To audit and review the SQL code used to obtain the final results.
Definitely look into Access MACROS. Coupled with identifying static data in your data sets, you can offload processing of your complex background analytic queries to improve the user experience when they view and query through the final results. Good Luck!

How do I get two sums of two separate tables with no joint keys in a single crystal report?

I have data in two tables (see below for a sample) - how do I create a Crystal report (more of a "score card" really) displaying only sum(table1.column1) and sum(table2.column1) with no other details? When I try, one of the sums gets way too big, indicating it has been included in some inner loop in the calculations.
Table1:
Column1: Integer
Column2: Varchar(100)
...
Table2:
Column1: Integer
Column2: Varchar(50)
...
Note - there are no joint keys, the only relation between the tables is that they relate to the same business area.
Add a grouping levels for Table1.uid. Create a running total Table1Sum, sum on Table1.Column1, on change of group Table1.uid, reset never. Create a running total Table2Sum, sum on Table2.Column1, on every record, reset on change of group Table1.uid. Print both running totals in the report footer.
Place your queries in separate subreports. (This is what I'd probably do.)
The first one obviously requires (1) a unique key in Table1 and (2) printing the values in the footer. If those constraints won't work, two subreports should still work.
select t1.cnt, t2.cnt
from ( select count(*) cnt from table1 where... ) t1
, ( select count(*) cnt from table2 where... ) t2
If you want to avoid the sub-query approach, the only real route that I can think of is to use sub-reports.
2 ways I can think of:
Put each query in its own sub-report, and link them into your main report.
Put one query in your main report, and the other in a linked sub-report.
I answer this with the caveat that it will almost certainly be slower than simply using one query (as in Randy's answer), because Crystal Reports is not as efficient as the DB engine. It's also probably going to be harder than maintain. Basically, while you certainly can do it this way, I'm not sure I would.
You could use two SQL Expression fields. Each field needs to return a scalar value. You can correlate (link) each query with the main-report's query as well.

VB6 SQL 2005 Database Index Question

I have a VB app that accesses a sql database. I think it’s running slow, and I thought maybe I didn’t have the tables propery indexed. I was wondering how you would create the indexes? Here’s the situation.
My main loop is
Select * from Docrec
Order by YearFiled,DocNumb
Inside this loop I have two others databases hits.
Select * from Names
Where YearFiled = DocRec.YearFiled
and Volume = DocRec.Volume and Page = DocRec.Page
Order by SeqNumb
Select * from MapRec
Where FiledYear = DocRec.YearFiled
and Volume = DocRec.Volume and Page = DocRec.Page
Order by SeqNumb
Hopefully I made sense.
Try in one query using INNER JOIN:
SELECT * FROM Doctec d
INNER JOIN Names n ON d.YearField = n.YearField AND d.Volume = n.Volume AND d.Page = n.Page
INNER JOIN MapRec m ON m.FiledYear = n.YearFiled AND m.Volume = n.Volumen and m.Page = n.Page
ORDER BY YearFiled, DocNumb
You will have only one query to database. The problem can be that you hit database many times and get only one (or few) row(s) per time.
Off the top, one thing that would help would be determining if you really need all columns.
If you don't, instead of SELECT *, select just the columns you need - that way you're not pulling as much data.
If you do, then from SQL Server Management Studio (or whatever you use to manage the SQL Server) you'll need to look at what is indexed and what isn't. The columns you tend to search on the most would be your first candidates for an index.
Addendum
Now that I've seen your edit, it may help to look at why you're doing the queries the way you are, and see if there isn't a way to consolidate it down to one query. Without more context I'd just be guessing at more optimal queries.
In general looping through records is a poor idea. can you not do a set-based query that gives you everything you need in one pass?
As far as indexing consider any fields that you use in the ordering or where clauses and any fileds that arein joins. Primary keys are indexed as part of the setup of a primary ley but foreign keys are not. Often people forget that they need to index them as well.
Never use select * in a production environment. It is a poor practice. Do not ever return more data than you need.
I don't know if you need the loop. If all you are doing is grabbing the records in maprec that match for docrec and then the same for the second table then you can do this without a loop using inner join syntax.
select columnlist from maprec m inner join docrec d on (m.filedyear = d.yearfield and m.volume = d.volume and m.page=d.page)
and then again for the second table...
You could also trim up your queries to return only the columns needed instead of returning all if possible. This should help performance.
To create an index by yourself in SQL Server 2005, go to the design of the table and select the Manage Indexes & Keys toolbar item.
You can use the Database Engine Tuning Advisor. You can create a trace (using sql server profiler) of your queries and then the Advisor will tell you and create the indexes needed to optimize for your query executions.
UPDATE SINCE YOUR FIRST COMMENT TO ME:
You can still do this by running the first query then the second and third without a loop as I have shown above. Here's the trick. I am thinking you need to tie the first to the second and third one hence why you did a loop.
It's been a while since I have done VB6 recordsets BUT I do recall the ability to filter the recordset once returned from the DB. So, in this case, you could keep your loop but instead of calling SQL every time in the loop you would simply filter the resulting recordset data based on the first record. You would initialize / load the second & third query before this loop to load the data. Using the syntax above that I gave will load in each of those tables the matching to the parent table (docrec).
With this, you will still only hit the DB three times but still retain the loop you need to have the parent docrec table traversed so you can do work on it AND the child tables when you do have a match.
Here's a few links on ado recordset filtering....
http://www.devguru.com/technologies/ado/QuickRef/recordset_filter.html
http://msdn.microsoft.com/en-us/library/ee275540(BTS.10).aspx
http://www.w3schools.com/ado/prop_rs_filter.asp
With all this said.... I have this strange feeling that perhaps it could be solved with just a left join on your tables?
select * from docrec d
left join maprec m on (d.YearFiled= m.FiledYear and d.Volume = m.Volume and d.Page = m.Page)
left join names n on (d.YearFiled = n.YearFiled and d.Volume = n.Volume and d.Page = n.Page)
this will return all DocRec records AND add all the maprec values and name values where it matches OR NULL if not.
If this fits your need it will only hit the DB once.

Semi-Distinct MySQL Query

I have a MySQL table called items that contains thousands of records. Each record has a user_id field and a created (datetime) field.
Trying to put together a query to SELECT 25 rows, passing a string of user ids as a condition and sorted by created DESC.
In some cases, there might be just a few user ids, while in other instances, there may be hundreds.
If the result set is greater than 25, I want to pare it down by eliminating duplicate user_id records. For instance, if there were two records for user_id = 3, only the most recent (according to created datetime) would be included.
In my attempts at a solution, I am having trouble because while, for example, it's easy to get a result set of 100 (allowing duplicate user_id records), or a result set of 16 (using GROUP BY for unique user_id records), it's hard to get 25.
One logical approach, which may not be the correct MySQL approach, is to get the most recent record for each for each user_id, and then, if the result set is less than 25, begin adding a second record for each user_id until the 25 record limit is met (maybe a third, fourth, etc. record for each user_id would be needed).
Can this be accomplished with a MySQL query, or will I need to take a large result set and trim it down to 25 with code?
I don't think what you're trying to accomplish is possible as a SQL query. Your desire is to return 25 rows, no matter what the normal data groupings are whereas SQL is usually picky about returning based on data groupings.
If you want a purely MySQL-based solution, you may be able to accomplish this with a stored procedure. (Supported in MySQL 5.0.x and later.) However, it might just make more sense to run the query to return all 100+ rows and then trim it programmatically within the application.
This will get you the most recent for each user --
SELECT user_id, create
FROM items AS i1
LEFT JOIN items AS i2
ON i1.user_id = i2.user_id AND i1.create > i2.create
WHERE i2.id IS NULL
his will get you the most recent two records for each user --
SELECT user_id, create
FROM items AS i1
LEFT JOIN items AS i2
ON i1.user_id = i2.user_id AND i1.create > i2.create
LEFT JOIN items IS i3
ON i2.user_id = i3.user_id AND i2.create > i3.create
WHERE i3.id IS NULL
Try working from there.
You could nicely put this into a stored procedure.
My opinion is to use application logic, as this is very much application layer logic you are trying to implement at the DB level, i.e. filtering down the results to make the search more useful to the end user.
You could implement a stored procedure (personally I would never do such a thing) or just get the application to decide which 25 results.
One approach would be to get the most recent item from each user, followed by the most recent items from all users, and limit that. You could construct pathological examples where this probably isn't what you want, but it should be pretty good in general.
Unfortunately, there is no easy way :( I had to do something similar when I built a report for my company that would pull up customer disables that were logged in a database. Only problem was that the disconnect is ran and logged every 30 minutes. Therefore, the rows would not be distinct since the timestamp was different in every disconnect. I solved this problem with sub queries. I don't have the exact code anymore, but I beleive this is how I implemented it:
SELECT CORP, HOUSE, CUST,
(
SELECT TOP 1 hsd
FROM #TempTable t2
WHERE t1.corp = t2.corp
AND t1.house = t2.house
AND t1.cust = t2.cust
) DisableDate
FROM #TempTable t1
GROUP BY corp, house, cust -- selecting distinct
So, my answer is to elimante the non-distinct column from the query by using sub queries. There might be an easier way to do it though. I'm curious to see what others post.
Sorry, i keep editing this, I keep trying to find ways to make it easier to show what I did.