I have a financial application that has a large set of rules to check. The file is stored in a sql server. This is a web application using C#. Each file must be checked for these rules and there are hundreds of rules to consider. These rules change every few weeks to months. My thought was to store these rules in an xml file and have my code behind read the xml and dynamically generate the sql queries on the file. For testing purposes we are hard coding these rules, but would like to move to an architecture that is more accommodating of these rules changes. I'd think that xml is a good way to go here, but I'd appreciate advice of those that have gone down similar roads before.
The complexity of each rule check is small and generally are just simple statements such as: "If A && B && (C || D)" then write output string to log file".
My thought would be to code up the query in xml (A && B && (C || D)) and attach a string to that node in the xml. If the query is successful the string is written, if the query is not successful no string is written.
Thoughts?
In response to a comment, here is a more specific example:
The database has an entity called 'assets'. There are a number of asset types supported, such as checking, savings, 401k, IRA, etc etc. An example of a rule we want to check would be: "If the file has a 401k, append warning text to the report saying ". That example is for a really simple case.
We also get into more complex and dynamic cases where for a short period of time a rule may be applied to deny files with clients in specific states with specific property types. Classic example is to not allow condominiums in Florida. This rule max exist for a while, then be removed.
The pool of rules are constantly changing based on the discretion of large lending banks. We need to be able to make these rule changes outside of the source code for the site. Thus the idea of using xml and have the C# parse the xml and apply the rules dynamically was my idea. Does this help clarify the application and its needs?
could you just have a table with SQL in it? you could then formalist it a bit by having the SQL return a particular strucure..
so you table of checks might be:
id | chechGroup | checkName | sql |
1 | '401k checks'| '401k present' |select |
| '401k present'|
| ,count(*) |
| ,'remove 401k'|
|from |
| assests |
|where |
| x like '401k%'|
you could insist that the sql in the sql column returns something of the format:
ruleName | count | comment
'401k present'| 85 |'remove 401k'
you could have different types of rules.. when i have done similar to this I have not returned totals instead I have returned something more like:
table | id | ruleBorken | comment
'assets' | 1 | '401k present' | 'remove 401k'
this obviously would have a query more like:
select
'assets'
,id
,'401k present'
,'remove 401k'
from
assets
where
x like '401k%'
this makes it easier to generate interactive reports where the aggregate functions are done by the report (e.g. ssrs) allowing drill down to problem records.
the queries that validate the rules can either be run within a stored procedure that selects the queries out and uses EXEC to execute them, or they could be run from your application code one by one.
some of the columns (e.g. rule name) can be populated but he calling stored procedure or code.
the comments and rulename in this example are basically the same, but it can be handy to have the comments separate and put a case statement in there. - e.g. when failing validation rules, say on fields that should not be blank if you have a 401k, then you can have a case statement that tells which fields are missing in the comments.
If you want end users or non devs to create the rules then you could look at ways of generating the where clause in code and allowing the user to select table, rule name and generate a where clause through some interface, then save it to your rule table and you are good to go.
if all of your rules return a set format it allows you to have one report template for all rules - equally if you have exactly three types of rule then you could have three return formats and three report formats.. basically i like formalizing the result structure as it allows much more reuse elsewhere
Related
I have a db table containing these data
| id | curse_word |
----------------------
| 1 | d*mn |
| 2 | sh*t |
| 3 | as*h*le |
I am creating this website that sort of behaves like an online forum where people can created discussion threads and talk about it. To be able to post you need to register. To register to need a forum username. We wanted to prevent usernames from containing curse words anywhere in it. These curse words are defined in our database.
To check for this using the table above, I have thought of using an sql query with like.
But what if the username registered is something like this -> "someshttyperson". Since there is the word sht in it, the username should not be allowed. So this is something like using a sql query with reverse like.
I tried the following command below but it won't work:
select * from table where "somesh*ttyperson" LIKE curse_word
How can I make it work?
Although I'd give Tomalak's comment some consideration, here's a solution that might fit your needs:
SELECT COUNT(*) FROM curse_words
WHERE "somesh*ttyperson" LIKE CONCAT('%', curse_word, '%');
In this way you are actually composing a LIKE comparison term for each of the curse words by prepending and appending a % (e.g. %sh*t%).
LIKE might be a bit expensive to query if you plan on having millions of curse words but I think it's reasonable to assume you aren't.
All you have to do now is test for this result being strictly equal to 0 to let the nickname through, or forbid it otherwise.
I have the following table schema:
+-----+---------+----------+
+ chn | INTEGER | NULLABLE |
+-----+---------+----------|
+ size| STRING | NULLABLE |
+-----+---------+----------|
+ char| REPEATED| NULLABLE |
+-----+---------+----------|
+ ped | INTEGER | NULLABLE |
+-----+---------+----------
When I click on 'preview' in the Google BigQuery Web UI, I get the following result:
But when I query my table, I get this result:
It seems like "preview" is interpreting my repeated field as an array, I would want to get the same result in a query to limit the number of rows.
I did try to uncheck "Use Legacy SQL" which gave me the same result but the problem is that with my table, a same query takes ~1.0 sec to execute with "Use Legacy SQL" checked and ~12 seconds when it's unchecked.
I am looking for speed here so unfortunately, not using Legacy SQL is not an option...
Is there another way to render my repeated field like it does in the "preview" ?
Thanks for the help :)
In legacy SQL, BigQuery flattens the result of queries by default. This means two things:
All child fields of RECORD fields are propagated to the top-level, with their names changed from record.subrecord.leaf to record_subrecord_leaf. Parent records are removed from the schema.
All repeated fields are converted to fields of optional mode, with each repeated value expanded into its own row. (As a side note, this step is very similar to the FLATTEN function exposed in legacy SQL.)
What you see here is a product of #2. Each repeated value is becoming its own row (as you can see by the row count on the left-hand side in your two images) and the values from the other columns are, well, repeated for each new row.
You can prevent this behavior and receive "unflattened results" in a couple ways.
Using standard SQL, as you note in your original question. All standard SQL queries return unflattened results.
While using legacy SQL, setting the flattenResults parameter to false. This requires also specifying a destination table and setting allowLargeResults to false. These can be found in the Show Options panel beneath the query editor if you want to set them within the UI. Mikhail has some good suggestions for managing the temporary-ness of destination tables if you aren't interested in keeping them around.
I should note that there are a number of corner cases with legacy SQL with flattenResults set to false which might trip you up if you start writing more complex queries. A prominent example is that you can't output more than one independently repeated field in query results using legacy SQL, but you can output multiple with standard SQL. These issues are unlikely to be resolved in legacy SQL, and going forward we're suggesting people use standard SQL when they run into them.
If you could provide more details about your much slower query using standard SQL (e.g. job ID in legacy SQL, job ID in standard SQL, for comparison), I, and the rest of the BigQuery team, would be very interested in investigating further.
Is there another way to render my repeated field like it does in the
"preview" ?
To see original not-flattened output in Web UI for Legacy SQL, i used to set respective options (click Show Options) to actually write output to table with checked Allow Large Results and unchecked Flatten Results.
This actually not only saves result into table but also shows result in the same way as preview does (because it is actually preview of that table). To make sure that table gets removed afterwards - i have "dedicated" dataset (temp) with default expiration set to 1 day (or hour - depends on how aggressive you want to be with your junk), so you don't need to worry of that table(s) - it will get deleted automatically for you. Wanted to note: this was quite a common pattern for us to deal with and having to do extra settings was boring, so we ended up with our own custom UI that does all this for user automatically
What you see is called Flatten.
By default the UI flattens the query output, there is currently no option to show query results like you want. In order to produce unflatten results you must write to a table, but that's different thing.
I'll describe my scenario so you guys understand what type of design pattern I'm looking for.
I'm making an application where I provide someone with a link that is associated with one or more files. For example, someone needs somePowerpoint.ppx, main.cpp and somevid.mp4, and I have a tool that makes kj13h1djdsja213j1hhadad9933932 associated with those 3 files so that I can give someone
mysite.com/getfiles?fid=kj13h1djdsja213j1hhadad9933932
and they'll get a list of those files that they can download individually or all at once.
Since I'm new to SQL, the only way I know of doing that is having my tool use a table like
fid | filename
------------------------------------------------------------------
kj13h1djdsja213j1hhadad9933932 somePowerpoint.ppx
kj13h1djdsja213j1hhadad9933932 main.cpp
kj13h1djdsja213j1hhadad9933932 somevid.mp4
jj133823u22h248884h4h24h01h232 someotherfile.someextension
to go along with the above example. It would be nice if I could do some equivalent of
fid | filename(s)
---------------------------------------------------------------------------
kj13h1djdsja213j1hhadad9933932 somePowerpoint.ppx, main.cpp, somevid.mp4
jj133823u22h248884h4h24h01h232 someotherfile.someextension
but I'm not sure if that's possible or if I should be using some other design pattern altogether.
Any advice?
I believe Concatenate many rows into a single text string? can help give you a query that would generate your condensed format (you'd still want to store it in SQL with the full list, but you could make a view showing the condensed version using the query in the link)
I'm trying to create a set of data that I'm going to write out to a file, it's essentially a report composed of various fields from a number of different tables, some columns need to have some processing done on them, some can just be selected.
Different users will likely want different processing performed on certain columns, and in the future, I'll probably need to add additional functions for computed columns.
I'm considering the cleanest/most flexable approach to storing and using all the different functions I'm likely to need for these computed columns, I've got two ideas in my head, but I'm hoping there might be a much more obvious solution I'm missing.
For a simple, slightly odd example, a Staff table:
Employee | DOB | VacationDays
Frank | 01/01/1970 | 25
Mike | 03/03/1975 | 24
Dave | 05/02/1980 | 30
I'm thinking I'd either end up with a query like
SELECT NameFunction(Employee, optionID),
DOBFunction(DOB, optionID),
VacationFunction(VacationDays, optionID),
from Employee
With user defined functions, where the optionID would be used in a case statement inside the functions to decide what processing to perform.
Or I'd want to make the way the data is returned customisable using a lookup table of other functions:
ID | Name | Description
1 | ShortName | Obtains 3 letter abbreviation of employee name
2 | LongDOB | Returns DOB in format ~ 1st January 1970
3 | TimeStampDOB | Returns Timestamp for DOB
4 | VacationSeconds | Returns Seconds of vaction time
5 | VacationBusinessHours | Returns number of business hours of vacation
Which seems neater, but I'm not sure how I'd formulate the query, presumably using dynamic SQL? Is there a sensible alternative?
The functions will be used on a few thousand rows.
The closest answer I've found was in this thread:
Call dynamic function name in SQL
I'm not a huge fan of dynamic SQL, although in this case I think it might be the best way to get the result I'm after?
Any replies appreciated,
Thanks,
Chris
I would go for the second solution. You could even use real stored proc names in your lookup table.
create proc ShortName (
#param varchar(50)
)as
begin
select 'ShortName: ' + #param
end
go
declare #proc sysname = 'ShortName'
exec #proc 'David'
As you can see in the example above, the first parameter of exec (i.e. the procedure name) can be a parameter. This said with all the usual warnings regarding dynamic sql...
In the end, you should go with whichever is faster, so you should try both ways (and any other way someone might come up with) and decide after that.
I like better the first option, as long as your functions don't have extra selects to a table. You may not even need the user defined functions, if they are not going to be reused in a different report.
I prefer to use Dynamic SQL ony to improve a query's performance, such as adding a dynamic ordering or adding / removing complex WHERE conditions.
But these are all subjective opinions, the best thing is try, compare, and decide.
Actually, this isn't a question of what's faster. It is a question of what makes the code cleaner, particularly for adding new functionality (new columns, new column formats, re-ordering them).
Don't think of your second approach as "using dynamic SQL", since that tends to have negative connotations. Instead, think of it as a data-driven approach. You want to build a table that describes the columns that users can get, and the formats. This is great! Users can then provide a list of columns, and you'll have a magical stored procedure that combines the information from the users with the information in your metadata table, and produces the desired result.
I'm a big fan of data-driven approaches, and dynamic SQL is the best SQL tool I've found so far for implementing them.
I'm currently working on improving my database to make room for growth. As it stands, different users have different 'permissions' to areas of the website. Some users have permissions to multiple areas of the website.
I'd like some feedback if I'm doing this in the most efficient way:
tblUsers:
usrID usrFirst usrLast phone //etc....
1 John Doe
2 Jane Smith
3 Bill Jones
tblAreas:
id name
1 Marketing
2 Support
3 Human Resources
4 Media Relations
tblPermissions:
id usrID areaID
1 1 2
2 1 4
3 2 1
4 3 3
Right now, for each "area", I have separate directories. However, I'd like to minimize all of these directories down to one main directory, and then redirect users on logging in to their appropriate 'area' based upon their permissions.
Does it sound like I'm doing this correctly? I've never created a multi-layered site with different permissions and different groups of people, thus, I'm certainly open to learning more on how to do this correctly.
Thanks very much!
The general design is ok. The issues that pop out on me relate to naming.
SQL doesn't need hungarian notation -- generally considered unnecessary / bad (tblUsers -> users).
I wouldn't prefix table-names to column-names ...
... except for column "id" which should always include your table name (i.e. areaId)
Your "first" and "last" column don't make sense (hint: firstName)
I'd rename tblPermissions -> userAreas
Depending on your programming language and database, I'd also recommend using underscore instead of capitalization for your table/column-names.
As for using separate directories for different groups, I'd advise against it. Have the security-checks in your code instead of your directory layout.
Reasoning:
What happens when somebody decides that support is also allowed to do some marketing stuff? Should you change your code, or add a record into your database?
Or what if you have overlapping actions?
#brianpeiris: A couple of things come to mind:
No need for column aliases in JOINs
Makes it easier to search through code ("foo_id" gives less results than "id")
JOIN USING (foo_id) instead of JOIN ON (foo.id=bar.id).
The schema looks fine.
I would suggest that you put access control in the controller and base it of of URL path, so that you are not coding it into every section.
Yes, this seems like it is addressing your need perfectly from the database side.
The challenge will be using the data as simply and declaratively as possible. Where is the right place to declare what "area" you are in? Does each page do this, or is there a function that calculates it, or can your controllers do it? as someone suggests. The second part is evaluating the current user against this. Ideally you end up with a single function like "security_check_for_area(4)" that does it all.