I have take over a position as a BI consultant. and almost ALL of the prior SQL are build in a very funny way.. Especially compared to what i have been teached.. And I wonder if the other "way" has particular name.. Some times when i try to explain how the old things are build, and why they are so complicated to change, compared to the way i'm building it where they are getting change pretty quickly..
I have always "learned" to make 1 script, with CTE, DT, SUBQUERIES ETC.. But all of the previous code are build in many many steps.
Even simple tasks are build up in MANY steps..
For example if they wanted to find info from 3 different tables. it could be like (simplyfied for example reasons, i have scripts i have rewritten from 56 steps, to 2 steps. and gotten 76% performance boost)
Create qtemp.cust as (
Select
CustNo,
CustName
From Customer)
;
Alter Table qtemp.cust
add Column Revenue
;
Update Table qtemp.cust A
Set Revenue = (select sum(revenue) from Sales B Where A.CustNo = B.CustNo)
;
Insert into F_SALES
Select * from qtemp.cust
It's not that they have been idiots, so the above statement where all info are in the sales table they would just group by and take the info from there. But in general all code are build like above. and often there are used qtemp tables instead of subqueries or CTE.. And in lack of a better word i call the old work Stepped process and my own work for one step process.
So are there are termonology for it?
Also its on and old DB2 server, which are now the newest db2 version.. but is it legacy from a time where db2 didn't support DT, CTE, subqueries etc. I'm very currious why anyone would build it like above, and if there is a name for it?
Related
I have some queries that I want to run in a sequential Manner. Is it possible to schedule multiple queries under one scheduled query in Big Query? Thanks
tack.imgur.com/flUN4.jpg
If you don't need all of the intermediate tables and are just interested in the final output... consider using CTEs.
with first as (
select *, current_date() as todays_date from <table1>
),
second as (
select current_date(), concat(field1,field2) as new_field, count(*) as ct
from first
group by 1,2
)
select * from second
You can chain together as many of these as needed.
If you do need all of these intermediate tables materialized, you are venturing into ETL and orchestration tools (dbt, airflow, etc) or will need to write a custom script to execute several commands sequentially.
Not currently, but an alpha program for scripting support in BigQuery was announced at Google Cloud Next in April. You can follow the relevant feature request for updates. In the meantime, you could consider using Cloud Composer to execute multiple sequential queries or an App Engine cron with some code to achieve sequential execution on a regular basis.
Edit (October 2019): support for scripting and stored procedures is now in beta. You can submit multiple queries separated with semi-colons and BigQuery is able to run them now.
I'm not 100% sure if this is what you're looking for, but I'm confident that you won't need to orchestrate many queries to do this. It may be as simple to use the INSERT...SELECT syntax, like this:
INSERT INTO
YourDataset.AdServer_Refine
SELECT
Placement_ExtID,
COALESCE(m.New_Ids,a.Placement_ExtID) AS New_Ids,
m.Labels,
CONCAT(Date," - ",New_Ids) AS Concatenated,
a.Placement_strategy,
a.Campaign_Id,
a.Campaign,
a.Cost,
a.Impressions,
a.Clicks,
a.C_Date AS Current_date,
a.Date
FROM
YourDataset.AdServer AS a
LEFT JOIN
YourDataset.Matching AS m
USING(Placement_ExtID)
WHERE
a.Date = CURRENT_DATE()
This will insert all the rows that are output from SELECT portion of the query (and you can easily test the output by just running the SELECT).
Another option is to create a scheduled query that outputs to your desired table from the SELECT portion of the query above.
If that isn't doing what you're expecting, please clarify the question and leave a comment and I'm happy to try to refine the answer.
I have joined a new job where I am required to use FileMaker (and gradually transition systems to other databases). I have been a DB Admin of a MS SQL Server database for ~2 years, and I am very well versed in PL/SQL and T-SQL. I am trying to pan my SQL knowledge to FMP using the ExecuteSQL functionaloty, and I'm kinda running into a lot of small pains :)
I have 2 tables: Movies and Genres. The relevant columns are:
Movies(MovieId, MovieName, GenreId, Rating)
Genres(GenreId, GenreName)
I'm trying to find the movie with the highest rating in each genre. The SQL query for this would be:
SELECT M.MovieName
FROM Movies M INNER JOIN Genres G ON M.GenreId=G.GenreId
WHERE M.Rating=
(
SELECT MAX(Rating) FROM Movies WHERE GenreId = M.GenreId
)
I translated this as best as I could to an ExecuteSQL query:
ExecuteSQL ("
SELECT M::MovieName FROM Movies M INNER JOIN Genres G ON M::GenreId=G::GenreId
WHERE M::Rating =
(SELECT MAX(M2::Rating) FROM Movies M2 WHERE M2::GenreId = M::GenreId)
"; "" ; "")
I set the field type to Text and also ensured values are not stored. But all I see are '?' marks.
What am I doing incorrectly here? I'm sorry if it's something really stupid, but I'm new to FMP and any suggestions would be appreciated.
Thank you!
--
Ram
UPDATE: Solution and the thought process it took to get there:
Thanks to everyone that helped me solve the problem. You guys made me realize that traditional SQL thought process does not exactly pan to FMP, and when I probed around, what I realized is that to best use SQL knowledge in FMP, I should be considering each column independently and not think of the entire result set when I write a query. This would mean that for my current functionality, the JOIN is no longer necessary. The JOIN was to bring in the GenreName, which is a different column that FMP automatically maps. I just needed to remove the JOIN, and it works perfectly.
TL;DR: The thought process context should be the current column, not the entire expected result set.
Once again, thank you #MissJack, #Chuck (how did you even get that username?), #pft221 and #michael.hor257k
I've found that FileMaker is very particular in its formatting of queries using the ExecuteSQL function. In many cases, standard SQL syntax will work fine, but in some cases you have to make some slight (but important) tweaks.
I can see two things here that might be causing the problem...
ExecuteSQL ("
SELECT M::MovieName FROM Movies M INNER JOIN Genres G ON
M::GenreId=G::GenreId
WHERE M::Rating =
(SELECT MAX(M2::Rating) FROM Movies M2 WHERE M2::GenreId = M::GenreId)
"; "" ; "")
You can't use the standard FMP table::field format inside the query.
Within the quotes inside the ExecuteSQL function, you should follow the SQL format of table.column. So M::MovieName should be M.MovieName.
I don't see an AS anywhere in your code.
In order to create an alias, you must state it explicitly. For example, in your FROM, it should be Movies AS M.
I think if you fix those two things, it should probably work. However, I've had some trouble with JOINs myself, as my primary experience is with FMP, and I'm only just now becoming more familiar with SQL syntax.
Because it's incredibly hard to debug SQL in FMP, the best advice I can give you here is to start small. Begin with a very basic query, and once you're sure that's working, gradually add more complicated elements one at a time until you encounter the dreaded ?.
There's a number of great posts on FileMaker Hacks all about ExecuteSQL:
Since you're already familiar with SQL, I'd start with this one: The Missing FM 12 ExecuteSQL Reference. There's a link to a PDF of the entire article if you scroll down to the bottom of the post.
I was going to recommend a few more specific articles (like the series on Robust Coding, or Dynamic Parameters), but since I'm new here and I can't include more than 2 links, just go to FileMaker Hacks and search for "ExecuteSQL". You'll find a number of useful posts.
NB If you're using FMP Advanced, the Data Viewer is a great tool for testing SQL. But beware: complex queries on large databases can sometimes send it into fits and freeze the program.
The first thing to keep in mind when working with FileMaker and ExecuteSQL() is the difference between tables and table occurrences. This is a concept that's somewhat unique to FileMaker. Succinctly, tables store the data, but table occurrences define the context of that data. Table occurrences are what you're seeing in FileMaker's relationship graph, and the ExecuteSQL() function needs to reference the table occurrences in its query.
I agree with MissJack regarding the need to start small in building the SQL statement and use the Data Viewer in FileMaker Pro Advanced, but there's one more recommendation I can offer, which is to use SeedCode's SQL Explorer. It does require the adding of table occurrences and fields to duplicate the naming in your existing solution, but this is pretty easy to do and the file they offer includes a wizard for building the SQL query.
the problem story goes like:
consider a program to manage bank accounts with balance limits for each customer
{table Customers, table Limits} where for each Customer.id there's one Limit record
then the client said to store a history for the limits' changes, it's not a problem since I've already had date column for Limit but the active/latest limits's view-query needs to be changed
before: Customer-Limit was 1 to 1 so a simple select did the job
now: it would show all the Limits' records which means multiple records for each Customers and I need the latest Limits only so I thought of something like this pseudo code
foreach( id in Customers)
{
select top 1 *
from Limits
where Limits.customer_id = id
order by Limits.date
}
but while looking through SO for similar issues, I came across stuff like
"95% of the time when you need a looping structure in tSQL you are probably doing it wrong"-JohnFx
and
"SQL is primarily a set-orientated language - it's generally a bad idea to use a loop in it."-Mark Bannister
can anyone confirm/explain why is it wrong to loop? and in the explained problem above, what am I getting wrong that I need to loop?
thanks in advance
update : my solution
in light of TomTom's answer & suggested link here and before Dean kindly answered with code I came up with this
SELECT *
FROM Customers c
LEFT JOIN Limits a ON a.customer_id = c.id
AND a.date =
(
SELECT MAX(date)
FROM Limits z
WHERE z.customer_id = a.customer_id
)
thought I'd share :>
thanks for your response,
happy coding
Will this do?
;with l as (
select *, row_number() over(partition by customer_id order by date desc) as rn
from limits
)
select *
from customers c
left join l on c.customer_id = l.customer_id and l.rn = 1
I am assuming that earlier (i.e. before implementing the history functionality) you must be updating the Limits table. Now, for implementing the history functionality you have started inserting new records. Doesnt this trigger a lot of changes in your databases and code?
Instead of inserting new records, how about keeping the original functionality as is and creating a new table say Limits_History which will store all the old values from Limits table before updating it? Then all you need to do is fetch records from this table if you want to show history. This will not cause any changes in your existing SPs and code hence will be less error prone.
To insert record in the Limits_History table, you can simply create an AFTER TRIGGER and use the deleted magic table. Hence you need not worry about calling an SP or something to maintain history. The trigger will do this for you. Good examples of trigger are here
Hope this helps
It is wrong. You can do the same by quyting customers and limits with a subquery limiting to the most recent record on limit.
This is similar in concept to the query presented in Most recent record in a left join
You may have to do so in 2 joins - get most recent date, then get limit for the date. While this may look complex - it is a beginner issue, talk complex when you have sql statements reaching 2 printed pages and more ;)
Now, for an operational system the table design is broken - limits should contain the mos trecent limit, and a LimitHistory table the historical (or: all) entries, allowing fast retrieval of the CURRENT limit (which will be the one to apply to all transaction) without the overhead of the history. The table design you have assumes all limits are identical - that may be the truth (is the truth) for a reporting data warehouse, but is wrong for a transactional system as the history is not transacted.
Confirmation for why loop is wrong is exactly in the quoted parts in your question - SQL is a set-orientated language.
This means when you work on sets there's no reason to loop through the single rows, because you already have the 'result' (set) of data you want to work on.
Then the work you are doing should be done on the set of rows, because otherwise your selection is wrong.
That being said there are of course situations where looping is done in SQL and it will generally be done via cursors if on data, or done via a while loop if calculating stuff. (generally, exceptions always change).
However, as also mentioned in the quotes, often when you feel like using a loop you either shouldn't (it's poor performance) or you're doing logic in the wrong part of your application.
Basically - it is similar to how object orientated languages works on objects and references to said objects. Set based language works on - well, sets of data.
SQL is basically made to function in that manner - query relational data into result sets - so when working with the language, you should let it do what it can do and work on that. Just as if it was Java or any other language.
Wondering if anyone may be able to help me with some SQL here. I'm tasked with retrieving some data from a legacy DB system - It's an IBM Informix DB running v7.23C1. It may well be that what I'm trying to do here is pretty simple, but for the life of me I can't figure it out.
I'm used to MS SQL Server, rather than any other DB system and this one seems quite old: http://publib.boulder.ibm.com/epubs/pdf/3731.pdf (?)
Basically, I just want to run a query that includes nesting, but I can't seem to figure out how to do this. So for example, I have a query that looks like this:
SELECT cmprod.cmp_product,
(stock.stk_stkqty - stock.stk_allstk) stk_bal,
stock.stk_ospurch,
stock.stk_backord,
'Current Sales Period',
'Current Period -1',
'Current Period -2',
cmprod.cmp_curcost,
stock.stk_credate,
stock.stk_lastpurch,
stock.stk_binref
FROM informix.stock stock,
informix.cmprod cmprod
WHERE stock.stk_product = cmprod.cmp_product
AND (cmp_category = 'VOLV'
OR cmp_category = 'VOLD'
OR cmp_category = 'VOLA')
AND stk_loc = 'ENG';
Now, basically where I have values like 'Current Period -1' I want to include a nested field which will run a query to get the sales within a given date range. I'm sure I can put those together separately, but can't seem to get the compiler to be happy with my code when executed altogether.
Probably something like (NB, this specific query is for another column, but you get the idea):
SELECT s.stmov_product, s.stmov_trandate, s.stmov_qty
FROM informix.stmove s
WHERE s.stmov_product = '1066823'
AND s.stmov_qty > 0
AND s.stmov_trandate IN (
SELECT MAX(r.stmov_trandate)
FROM informix.stmove r
WHERE r.stmov_product = '1066823'
AND r.stmov_qty > 0)
What makes things a little worse is I don't have access to the server that this DB is running on. At the moment I have a custom C# app that connects via an ODBC driver and executes the raw SQL, parsing the results back into a .CSV.
Any and all help appreciated!
Under all circumstances, Informix 7.23 is so geriatric that it is unkind to be still running it. It is not clear whether this is an OnLine (Informix Dynamic Server, IDS) or SE (Standard Engine) database. However, 7.23 was the version prior to the Y2K-certified 7.24 releases, so it is 15 years old or thereabouts, maybe a little older.
The syntaxes supported by Informix servers back in the days of 7.23 were less comprehensive than they are in current versions. Consequently, you'll need to be careful. You should have the manuals for the server — someone, somewhere in your company should. If not, you'll need to try finding it in the graveyard manuals section of the IBM Informix web pages (start at http://www.informix.com/ for simplicity of URL; however, archaic manuals take some finding, but you should be able to get there from http://pic.dhe.ibm.com/infocenter/ifxhelp/v0/index.jsp choosing 'Servers' in the LHS).
If you are trying to write:
SELECT ...
(SELECT ... ) AS 'Current - 1',
(SELECT ... ) AS 'Current - 2',
...
FROM ...
then you need to study the server SQL Syntax for 7.23 to know whether it is allowed. AFAICR, OnLine (Informix Dynamic Server) would allow it and SE probably would not, but that is far from definitive. I simply don't remember what the limitations were in that ancient a version.
Judging from the 7.2 Informix Guide to SQL: Syntax manual (dated April 1996 — 17 years old), you cannot put a (SELECT ...) in the select-list in this version of Informix.
You may have to create a temporary table holding the results you want (along with appropriate key information), and then select from the temporary table in the main query.
This sort of thing is one of the problems with not updating your server for so long.
Sorry to be blunt, but can you at least have mercy on us by shortening the syntax?.. table.columns can be presented AS aliases. Meanwhile, if you cant upgrade to a newer version of Informix, you will have to rely on one or more SELECT INTO temp table queries in order to achieve your objective, which BTW, would make your coding more portable across different versions. You also have to evaluate whether using TEMP tables implies unacceptable processing times.
Are there any formal techniques for refactoring SQL similar to this list here that is for code?
I am currently working on a massive query for a particular report and I'm sure there's plenty of scope for refactoring here which I'm just stumbling through myself bit by bit.
I have never seen an exhaustive list like the sample you provided.
The most effective way to refactor sql that I have seen is to use the with statement.
It allows you to break the sql up into manageable parts, which frequently can be tested independently. In addition it can enable the reuse of query results, sometimes by the use of a system temporary table. It is well worth the effort to examine.
Here is a silly example
WITH
mnssnInfo AS
(
SELECT SSN,
UPPER(LAST_NAME),
UPPER(FIRST_NAME),
TAXABLE_INCOME,
CHARITABLE_DONATIONS
FROM IRS_MASTER_FILE
WHERE STATE = 'MN' AND -- limit to Minne-so-tah
TAXABLE_INCOME > 250000 AND -- is rich
CHARITABLE_DONATIONS > 5000 -- might donate too
),
doltishApplicants AS
(
SELECT SSN, SAT_SCORE, SUBMISSION_DATE
FROM COLLEGE_ADMISSIONS
WHERE SAT_SCORE < 100 -- Not as smart as the average moose.
),
todaysAdmissions AS
(
SELECT doltishApplicants.SSN,
TRUNC(SUBMISSION_DATE) SUBMIT_DATE,
LAST_NAME, FIRST_NAME,
TAXABLE_INCOME
FROM mnssnInfo,
doltishApplicants
WHERE mnssnInfo.SSN = doltishApplicants.SSN
)
SELECT 'Dear ' || FIRST_NAME ||
' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE) -- For stuff received today only
One of the other things I like about it, is that this form allows you to separate the filtering from the joining. As a result, you can frequently copy out the subqueries, and execute them stand alone to view the result set associated with them.
There is a book on the subject: "Refactoring Databases". I haven't read it, but it got 4.5/5 stars on Amazon and is co-authored by Scott Ambler, which are both good signs.
Not that I've ever found. I've mostly done SQL Server work and the standard techniques are:
Parameterise hard-coded values that might change (so the query can be cached)
Review the execution plan, check where the big monsters are and try changing them
Index tuning wizard (but beware you don't cause chaos elsewhere from any changes you make for this)
If you're still stuck, many reports don't depend on 100% live data - try precalculating portions of the data (or the whole lot) on a schedule such as overnight.
Not about techniques as much, but this question might help you find SQL refactoring tools:
Is there a tool for refactoring SQL, a bit like a ReSharper for SQL