I am developing a small web app (angularjs/jquery front end, postgresql 9.3 backend) in which I want to present a view of a largish (few million) set of records in a "grid" (read-only). I have a set of filters based on facets of the data that I would like the user to be able to apply serially; that is, one filter is applied and then the next filter is applied. The user can choose both the filters and the filter settings. This ends up being a set of logical AND operations (perhaps requiring SQL joins, as well).
I am interested in what folks do on the backend to improve the user experience. In particular, I can imagine:
Apply filters "dynamically" as a SQL query whenever pagination or additional filtering is applied
Create a cache at each level of filtering so that I can update data more quickly
There are clearly other options and I would like to hear what others would do in this situtation.
Not sure the question is “answerable”, but here’s example of what we’ve done. We have application that does not do filtering but rather allows the use to select which data items that want to see. Like your application, these could include joins to multiple other tables. We have a “driver” table that has the “user” version of the field name, a string for the inner join to the table that contains it, optionally a string for the Where clause if the inner join condition is not sufficient, and the name of the column on the DB.
We build the base query then look at the entries for all the items the user has selected. We add distinct inner join clauses form those fields (if three columns are coming from one table, we only want to join it once). We add And clauses to the Where clause encapsulation in parenthesis. And we add the column names to the select list.
Related
I have an SSIS package that is supposed to insert data from a flat-file into a database table. For the sake of this example, let's say I am wanting to insert User records. The data records come from other existing databases, so they already include a previously generated primary ID, which we would like to preserve and continue using. The records also include an email field which should also be unique in the destination table; this is enforced by the schema. A given batch could include records that have previously been "migrated" as well as a user might be in more than one of the original systems with the same email address. In addition to avoiding errors, I would also like to track any possible duplicates (on either the UserID OR the email fields) by writing those to a file.
Because matches can be made on either of the 2 fields, do I need to chain 2 Lookup Transformations? Or is there a way to specify OR operation instead of AND when using multiple columns? Or is there a better-suited transform that I should be using?
Thank you in advance.
Well, let's split your question.
Can I do a Lookup with OR condition on two fields?
Yes, you can.
Suppose you are lookup through User table. On the Lookup transformation General section - specify Partial cache or No cache as Cache mode. Then design your query in Connection section. Important -- map your data flow fields to query columns in Columns section. So far preparation is done.
Go to Advanced section and tick Modify the SQL statement flag. Modify the SQL statement below with something like
select * from (SELECT [ColA], [ColB], ...
FROM [User]) [refTable]
where [refTable].[ColA] = ? OR [refTable].[ColB] = ?
Then - hit Parameters button and specify data flow columns which should be mapped to the first ? - Parameter0, and so on.
As you see, it is possible but not easy.
Should you use two lookups or single complex lookup?
I would go for two lookups, as it allows you finer control and error reporting - with OR statement you can only report that something among unique fields matched. Doing specific lookups allow you to be more specific and design special flow steps if needed.
I have 3 tables in the same database, with couple of columns as common and rest non-matching columns as well. I need to show them together in such a fashion that the user should be able to distinguish between the source tables (Refer below diagrams). I want to know if I can achieve this in database itself, before passing its result on to my report UI or code behind?
I have tried achieving this using OUTER JOIN, FULL OUTER JOIN.
See here
Also here
I just have a general question about setting an Access query to Dynaset (Inconsistent Updates). I know it opens up the fields for editing and this increases the risk of maintaining data integrity, but what about in "controlled cases"?
For example, I have a table on the "one" side of 3 left outer joins. I want to allow edits (via a form) to any fields on the "one" side. The 3 outer joins are merely pulling information from these other tables to use in a calculated field in that query. So I need to show these calculations at this query level, but edit the primary table in the query. I know the changes I'm allowing in the form are just on the "one" side. Is this an allowable case for using Dynaset (Inconsistent Updates)? I just can't figure out an appropriate solution. I tried using a subquery for the one field instead of outer joins but that still left it locked.
I'm having some trouble in Access 2002...
I have two tables, one containing around 60k occurrences and one with a column and the foreign keys to make the join. In my form, I set the source to a query with these two tables joined (left join on the empty one). Basically, I end up with my 60k occurrences and my new column at the end.
Now, I need to allow my users to edit this field in my form. I found out that when the corresponding data exists in my empty table, I can edit the field just fine, however since we need this empty table to contain only the occurrences where we need to add the new column, I can't simply make a new entry for all of my occurrences.
Here are a schema of the two tables:
Table 1:
ID Sequence Col1 Col2 Col3 Col5
60k
Table 2:
ID Sequence Col6
0
And my query:
SELECT tblOne.*, tblTwo.Col6
FROM tblOne
LEFT JOIN tblTwo ON (tblOne.Sequence=tblTwo.Sequence) AND (tblOne.ID=tblTwo.ID);
If you're willing to consider a different approach, this could be easier with a form/subform approach.
Base the main form on tblOne and base the subform on tblTwo. Use Sequence and ID as the master/child link fields (you can find that setting in the property sheet of the subform control).
With that design, the subform will display existing tblTwo rows which match the main form's current tblOne row. And you can add a new matching tblTwo row at the subform's new record --- it will "inherit" the Sequence and ID values of the current main form row.
By the way, Sequence is a reserved word. Rename that field if possible. If you must keep that name, you can avoid the risk of confusing the db engine by enclosing the name in square brackets or by qualifying the field name with the table name (or alias) in your SQL statements.
If you have a query that is not updateable, check out Allen Browne's Tips on what might cause this: http://allenbrowne.com/ser-61.html
MS Access has a shortcoming. If you wish to edit data in list-like views, it generally works best for it to be displayed in basically the same structure that it's stored in your tables. Edit: In reference to the comment left by Yawar about this not being an Access shortcoming, I'd like to point out that when developing in .NET it isn't uncommon to have a database structure that is quite unlike the data model classes used inside your application. In this case the GUI is built on the data model so the database may look somewhat (or event quite) different from your data models/GUI.
Back to MS Access, when you use a table join to create the recordsource/recordset for a datasheet form or continuous form, it's my understanding that only one of the tables is going to be updateable. In other words, only one side of the join is updateable. And in many cases, the recordset is not updateable at all, due to the DAO engine being confused. Update: I have deduced from the link below that what I wrote above seems to be more true of SQL Server than using JET/ACE backend.
The most common solution is as HansUp has suggested, use a form/subform approach. You can actually have a datasheet subform as a child of another datasheet subform, which would work quite well in your case here. There will just be an expandable plus sign at the far left of each record so you can add/edit/delete the record(s) in tblTwo.
Another option is to use an ActiveX grid control such as the iGrid from 10tec which means you'll write quite a bit of code for all kinds of things, like loading the recordset, writing changes/additions/deletions back to the database, handling formatting of cells, etc.
Yet another option is to use a fabricated ADO recordset. This is a terribly clumsy approach and I can't say that I've really seen it in use, mostly just experimented with it and read about it in theory. The problem is that you have to create a fabricated recordset that is nearly identical to the one that you generated, and then you have to loop through and copy all of the records from the generated recordset into your fabricated recordset. It's a tremendous amount of overhead, especially for that many records. And then, you must write code once again to write all additions/changes/deletions back to the database. Handling the creation of new primary keys can be tricky. This particular approach is not easy or simple and is not something I recommend a VBA beginner tackling.
If you're using SQL Server you should check out the following article at Microsoft's website. It covers a variety of material including updating multiple tables from a single recordsource/view. http://technet.microsoft.com/en-us/library/bb188204%28v=sql.90%29.aspx
We receive a data feed from our customers and we get roughly the same schema each time, though it can change on the customer end as they are using a 3rd party application. When we receive the data files we import the data into a staging database with a table for each data file (students, attendance, etc). We then want to compare that data to the data that we already have existing in the database for that customer and see what data has changed (either the column has changed or the whole row was possibly deleted) from the previous run. We then want to write the updated values or deleted rows to an audit table so we can then go back to see what data changed from the previous data import. We don't want to update the data itself, we only want to record what's different between the two datasets. We will then delete all the data from the customer database and import the data exactly as is from the new data files without changing it(this directive has been handed down and cannot change). The big problem is that I need to do this dynamically since I don't know exactly what schema I'm going to be getting from our customers since they can make customization to their tables. I need to be able to dynamically determine what tables there are in the destination, and their structure, and then look at the source and compare the values to see what has changed in the data.
Additional info:
There are no ID columns on source, though there are several columns that can be used as a surrogate key that would make up a distinct row.
I'd like to be able to do this generically for each table without having to hard-code values in, though I might have to do that for the surrogate keys for each table in a separate reference table.
I can use either SSIS, SPs, triggers, etc., whichever would make more sense. I've looked at all, including tablediff, and none seem to have everything I need or the logic starts to get extremely complex once I get into them.
Of course any specific examples anyone has of something like this they have already done would be greatly appreciated.
Let me know if there's any other information that would be helpful.
Thanks
I've worked on a similar problem and used a series of meta data tables to dynamically compare datasets. These meta data tables described which datasets need to be staged and which combination of columns (and their data types) serve as business key for each table.
This way you can dynamically construct a SQL query (e.g., with a SSIS script component) that performs a full outer join to find the differences between the two.
You can join your own meta data with SQL Server's meta data (using sys.* or INFORMATION_SCHEMA.*) to detect if the columns still exist in the source and the data types are as you anticipated.
Redirect unmatched meta data to an error flow for evaluation.
This way of working is very risky, but can be done if you maintain your meta data well.
If you want to compare two tables to see what is different the keyword is 'except'
select col1,col2,... from table1
except
select col1,col2,... from table2
this gives you everything in table1 that is not in table2.
select col1,col2,... from table2
except
select col1,col2,... from table1
this gives you everything in table2 that is not in table1.
Assuming you have some kind of useful durable primary key on the two tables, everything in both sets, is a change. Everything in the first set is an insert; Everything in the second set is a delete.