I have a SQL statement that pulls data I need but I can't get the syntax right in Crystal Reports.
This statement works in SQL:
SELECT
max([meter_reading])
FROM [Forefront].[dbo].[EC_METER_HISTORY_MC]
WHERE [Meter_Number] = '1' AND [Transaction_Date] < '20130101'
GROUP BY
[Company_Code], [Equipment_Code], [Meter_Number]
This is what I changed it to in crystal but I can't get the right syntax.
SELECT
Maximum({EC_METER_HISTORY_MC.meter_reading})
FROM [EC_METER_HISTORY_MC]
WHERE {EC_METER_HISTORY_MC.Meter_Number} = '1'
AND {EC_METER_HISTORY_MC.Transaction_Date} < {1?Startdate}
GROUP BY {EC_METER_HISTORY_MC.Company_Code}
,{EC_METER_HISTORY_MC.Equipment_Code}
,{EC_METER_HISTORY_MC.Meter_Number}
Your first step should be reading up on how SQL Expressions work in Crystal. Here is a good link to get you started.
A few of your problems include:
Using a parameter field. SQL Expressions are not compatible with CR
parameters and cannot be used in them.
SQL Expressions can only return scalar values per row of your report. That means that your
use of GROUP BY doesn't serve any purpose.
Your use of curly braces means that you're referencing those fields in the main report query instead of in the subquery you're trying to create with this expression.
Here's a simplified example that would find the max meter reading of a particular meter (for Oracle since that's what I know and you didn't specify which DB you're using):
case when {EC_METER_HISTORY_MC.Meter_Number} is null then null
else (select max(Meter_Reading)
from EC_METER_HISTORY_MC
where Meter_Number={EC_METER_HISTORY_MC.Meter_Number} --filter by the meter number from main query
and Transaction_Date < Current_Date) --filter by some date. CAN'T use parameter here.
end
You can't use parameter fields in a SQL Expression, sadly. Perhaps you can correlate the Transaction_Date to a table in the main query. Otherwise, I would suggest using a Command.
You have two options for the Command:
Use a single Command object as the data source for the whole report--which involves (potentially) a fair amount of rework.
Add a Command to the existing table set (in the Database 'Expert'). Link it to other tables as desired. This will perform a second SELECT and join the results in memory (WhileReadingRecords, if I'm not mistaken). The slight performance hit may we worth the benefit.
Related
in Microsoft Access I need to bind a cross-tab query to a report. To bind the query I need to know the column names in advance which is the problem as I do not know which dates will be in the data. The date names end up as the columns in the cross-tab query. as the dates vary the column names vary and I can't bind the report as I don't know the column names. the best solution I can think of is to calculate a column that replaces the date with its order. then the column names will always be 1,2,3,4,5,6,7,8. However I haven't been able to create this calculated order column. No matter what I do access bugs out. Part of the cause is that This table will be used for a cross-tab query with parameters so access bugs out with any additional complications.
How do I turn StatusDate into CalculatedOrder?
Edit:
I cannot use a subquery to make CalculatedOrder as is sujested in the comments because the original query has parameters. access bugs out when a subquery draws on a query based on parameters.
The real problem was two bugs in Access. My guess is Access will bug at any additional complexity. if the original table has parameters work with a copy of the table that doesn't have parameters. In all cases, before calculating the crosstab, copy just your values into another temporary table. In Access, crosstab doesn't work if a crosstab column variable is a correlated subquery.
You can make the CalculatedOrder column using a correlated subquery similar to allenbrowne.com/ranking.html.
if the table has parameters:
PARAMETERS myfirstparameter
SELECT StatusDate, CrossTabRow, CrossTabValue INTO NoParametersTable
FROM MyTableWithParameters
WHERE ...
Then make a query with only a grouped StatusDate.
SELECT NoParametersTable.StatusDate
FROM NoParametersTable
GROUP BY NoParametersTable.StatusDate;
Then Turn StatusDate into Order using a correlatedsubquery:
SELECT CrossTabRow, CrossTabValue, (SELECT Count(dupe.StatusDate) +1 FROM [MyGroupedStatusDateQuery] as dupe WHERE dupe.StatusDate < [MyGroupedStatusDateQuery].StatusDate) AS [Order]
FROM NoParametersTable INNER JOIN [MyGroupedStatusDateQuery] ON NoParametersTable.StatusDate = MyGroupedStatusDateQuery.StatusDate)
finally make sure to turn this final query into a table without a correlatedsubquery by copying the values into another temporary table. run the crosstab on the final temporary table and the crosstab will work because the final table just has values instead of parameters and a correlated subquery.
INSERT INTO #TEMP(ID,CID,STS,ETL_NBR,T_ID)
SELECT STG.ID,STG.CID,STG.STS,STG.ETL_NBR,STG.T_ID
FROM DBO.A_STAGE STG(NOLOCK)
INNER JOIN DBO.A_PRE PRE(NOLOCK)
ON PRE.ID=STG.ID AND PRE.CID=STG.CID
WHERE PRE.STS = 'D'
AND STG.ETL_NBR < PRE.ETL_NBR
Above query is constructed from Dynamic SQL inside a stored procedure. Tables involved in joins are actually being read through variables.This query hangs for small volume of data as well.
1) If I perform SELECT based on above conditions, The query still results 0 records. For insert it hangs for hours.
2) There are no blockers on this query.
Note: Since this is dynamic query when other tables are passed to variables this query runs smooth. It has issue with specific table only. I did update stats and rebuild index on that table. No use.
This can be the classic Parameter Sniffing, you can read more here: what-is-parameter-sniffing
There's a way to improve performance with such a queries, using the OPTION (RECOMPILE) in your query
You need to incluide at the end of query
You can also use the UNKNOWN for the each variable, like this:
OPTION (OPTIMIZE FOR (#variable 1UNKNOWN, #variable2 UNKNOWN, ....))
You can read more here: improving-query-performance-with-option-recompile-constant-folding-and-avoiding-parameter-sniffing-issues/
PRE.CID=STG.CID
This particular join has NULL and Spaces. It is not Unique or PK column. Hence the slowness. I have excluded them in Join by adding PRE.CID IS NOT NULL .
The query now runs fast.
I tried to run below query at teradata and It resulted as expected :
select column1 as c1Alias from my_table where column2 in ( c1Alias , 10 , 20 , 30) ;
But I tried to run same query on HIVE , It throws exception as given below :
FAILED: SemanticException [Error 10004]: Line 1:44 Invalid table alias or column reference 'c1Alias': (possible column names are: .......)
I am not surprised why it is failing at HIVE , but surprised how it is working on Teradata.
As per my understanding, Clauses are executed in order as WHERE >> SELECT. Apparently alias generated at SELECT clause would not be available for use in WHERE clause. Correct me if I am wrong here.
I really wanted to know how it is working in teradata ?
You're correct, logically any SELECT is processed in following order:
FROM
WHERE
GROUP BY
HAVING
OLAP functions
QUALIFY
create SELECT column list
SAMPLE
ORDER BY
Besides the proprietary QUALIFY/SAMPLE every DBMS will do it exactly the same.
When you add a filter to the WHERE-condition the column list is not yet created, thus using an alias should fail (and will fail in almost every other DBMS, afaik only Access allows using it similar to Teradata).
It's not failing because Teradata is older than Standard SQL and this seems to be an relict of the query language Teradata implemented first.
But it's a nice extension (just never alias to an existing column name to avoid confusing the optimizer and/or end user) and you get used to it very fast, it avoids lots cut&paste or Derived Tables.
The order of execution of SQL is explained very well over here:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
An excerpt from the post for your quick reference: (Credits to the author for covering all 10 parts of SQL)
FROM, including JOINs
WHERE
GROUP BY
HAVING
WINDOW functions
SELECT
DISTINCT
UNION
ORDER BY
10.LIMIT and OFFSET
EDIT: Here is a more complete set of code that shows exactly what's going on per the answer below.
libname output '/data/files/jeff'
%let DateStart = '01Jan2013'd;
%let DateEnd = '01Jun2013'd;
proc sql;
CREATE TABLE output.id AS (
SELECT DISTINCT id
FROM mydb.sale_volume AS sv
WHERE sv.category IN ('a', 'b', 'c') AND
sv.trans_date BETWEEN &DateStart AND &DateEnd
)
CREATE TABLE output.sums AS (
SELECT id, SUM(sales)
FROM mydb.sale_volue AS sv
INNER JOIN output.id AS ids
ON ids.id = sv.id
WHERE sv.trans_date BETWEEN &DateStart AND &DateEnd
GROUP BY id
)
run;
The goal is to simply query the table for some id's based on category membership. Then I sum these members' activity across all categories.
The above approach is far slower than:
Running the first query to get the subset
Running a second query the sums every ID
Running a third query that inner joins the two result sets.
If I'm understanding correctly, it may be more efficient to make sure that all of my code is completely passed through rather than cross-loading.
After posting a question yesterday, a member suggested I might benefit from asking a separate question on performance that was more specific to my situation.
I'm using SAS Enterprise Guide to write some programs/data queries. I don't have permissions to modify the underlying data, which is stored in 'Teradata'.
My basic problem is writing efficient SQL queries in this environment. For example, I query a large table (with tens of millions of records) for a small subset of ID's. Then, I use this subset to query the larger table again:
proc sql;
CREATE TABLE subset AS (
SELECT
id
FROM
bigTable
WHERE
someValue = x AND
date BETWEEN a AND b
)
This works in a matter of seconds and returns 90k ID's. Next, I want to query this set of ID's against the big table, and problems ensue. I'm wanting to sum values over time for the ID's:
proc sql;
CREATE TABLE subset_data AS (
SELECT
bigTable.id,
SUM(bigTable.value) AS total
FROM
bigTable
INNER JOIN subset
ON subset.id = bigTable.id
WHERE
bigTable.date BETWEEN a AND b
GROUP BY
bigTable.id
)
For whatever reason, this takes a really long time. The difference is that the first query flags 'someValue'. The second looks at all activity, regardless of what's in 'someValue'. For example, I could flag every customer who orders a pizza. Then I would look at every purchase for all customers who ordered pizza.
I'm not overly familiar with SAS so I'm looking for any advice on how to do this more efficiently or speed things up. I'm open to any thoughts or suggestions and please let me know if I can offer more detail. I guess I'm just surprised the second query takes so long to process.
The most critical thing to understand when using SAS to access data in Teradata (or any other external database for that matter) is that the SAS software prepares SQL and submits it to the database. The idea is to try and relieve you (the user) from all the database specific details. SAS does this using a concept called "implict pass-through", which just means that SAS does the translation from SAS code into DBMS code. Among the many things that occur is data type conversion: SAS only has two (and only two) data types, numeric and character.
SAS deals with translating things for you but it can be confusing. For example, I've seen "lazy" database tables defined with VARCHAR(400) columns having values that never exceed some smaller length (like column for a person's name). In the data base this isn't much of a problem, but since SAS does not have a VARCHAR data type, it creates a variable 400 characters wide for each row. Even with data set compression, this can really make the resulting SAS dataset unnecessarily large.
The alternative way is to use "explicit pass-through", where you write native queries using the actual syntax of the DBMS in question. These queries execute entirely on the DBMS and return results back to SAS (which still does the data type conversion for you. For example, here is a "pass-through" query that performs a join to two tables and creates a SAS dataset as a result:
proc sql;
connect to teradata (user=userid password=password mode=teradata);
create table mydata as
select * from connection to teradata (
select a.customer_id
, a.customer_name
, b.last_payment_date
, b.last_payment_amt
from base.customers a
join base.invoices b
on a.customer_id=b.customer_id
where b.bill_month = date '2013-07-01'
and b.paid_flag = 'N'
);
quit;
Notice that everything inside the pair of parentheses is native Teradata SQL and that the join operation itself is running inside the database.
The example code you have shown in your question is NOT a complete, working example of a SAS/Teradata program. To better assist, you need to show the real program, including any library references. For example, suppose your real program looks like this:
proc sql;
CREATE TABLE subset_data AS
SELECT bigTable.id,
SUM(bigTable.value) AS total
FROM TDATA.bigTable bigTable
JOIN TDATA.subset subset
ON subset.id = bigTable.id
WHERE bigTable.date BETWEEN a AND b
GROUP BY bigTable.id
;
That would indicate a previously assigned LIBNAME statement through which SAS was connecting to Teradata. The syntax of that WHERE clause would be very relevant to if SAS is even able to pass the complete query to Teradata. (You example doesn't show what "a" and "b" refer to. It is very possible that the only way SAS can perform the join is to drag both tables back into a local work session and perform the join on your SAS server.
One thing I can strongly suggest is that you try to convince your Teradata administrators to allow you to create "driver" tables in some utility database. The idea is that you would create a relatively small table inside Teradata containing the ID's you want to extract, then use that table to perform explicit joins. I'm sure you would need a bit more formal database training to do that (like how to define a proper index and how to "collect statistics"), but with that knowledge and ability, your work will just fly.
I could go on and on but I'll stop here. I use SAS with Teradata extensively every day against what I'm told is one of the largest Teradata environments on the planet. I enjoy programming in both.
You imply an assumption that the 90k records in your first query are all unique ids. Is that definite?
I ask because the implication from your second query is that they're not unique.
- One id can have multiple values over time, and have different somevalues
If the ids are not unique in the first dataset, you need to GROUP BY id or use DISTINCT, in the first query.
Imagine that the 90k rows consists of 30k unique ids, and so have an average of 3 rows per id.
And then imagine those 30k unique ids actually have 9 records in your time window, including rows where somevalue <> x.
You will then get 3x9 records back per id.
And as those two numbers grow, the number of records in your second query grows geometrically.
Alternative Query
If that's not the problem, an alternative query (which is not ideal, but possible) would be...
SELECT
bigTable.id,
SUM(bigTable.value) AS total
FROM
bigTable
WHERE
bigTable.date BETWEEN a AND b
GROUP BY
bigTable.id
HAVING
MAX(CASE WHEN bigTable.somevalue = x THEN 1 ELSE 0 END) = 1
If ID is unique and a single value, then you can try constructing a format.
Create a dataset that looks like this:
fmtname, start, label
where fmtname is the same for all records, a legal format name (begins and ends with a letter, contains alphanumeric or _); start is the ID value; and label is a 1. Then add one row with the same value for fmtname, a blank start, a label of 0, and another variable, hlo='o' (for 'other'). Then import into proc format using the CNTLIN option, and you now have a 1/0 value conversion.
Here's a brief example using SASHELP.CLASS. ID here is name, but it can be numeric or character - whichever is right for your use.
data for_fmt;
set sashelp.class;
retain fmtname '$IDF'; *Format name is up to you. Should have $ if ID is character, no $ if numeric;
start=name; *this would be your ID variable - the look up;
label='1';
output;
if _n_ = 1 then do;
hlo='o';
call missing(start);
label='0';
output;
end;
run;
proc format cntlin=for_fmt;
quit;
Now instead of doing a join, you can do your query 'normally' but with an additional where clause of and put(id,$IDF.)='1'. This won't be optimized with an index or anything, but it may be faster than the join. (It may also not be faster - depends on how the SQL optimizer is working.)
If the id is unique you might add a UNIQUE PRIMARY INDEX(id) to that table, otherwise it defaults to a Non-unique PI.
Knowing about uniquenes helps the optimizer to produce a better plan.
Without more info like an Explain (just put EXPLAIN in front of the SELECT) it's hard to tell how this can be improved.
One alternate solution is to use SAS procedures. I don't know what your actual SQL is doing, but if you're just doing frequencies (or something else that can be done in a PROC), you could do:
proc sql;
create view blah as select ... (your join);
quit;
proc freq data=blah;
tables id/out=summary(rename=count=total keep=id count);
run;
Or any number of other options (PROC MEANS, PROC TABULATE, etc.). That may be faster than doing the sum in SQL (depending on some details, such as how your data is organized, what you're actually doing, and how much memory you have available). It has the added benefit that SAS might choose to do this in-database, if you create the view in the database, which might be faster. (In fact, if you just run the freq off the base table, it's possible that would be even faster, and then join the results to the smaller table).
How can I UPDATE a field of a table with the result of a SELECT query in Microsoft Access 2007.
Here's the Select Query:
SELECT Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS
WHERE (((FUNCTIONS.Func_Pure)<=[Tax_ToPrice]) AND ((FUNCTIONS.Func_Year)=[Tax_Year]))
GROUP BY FUNCTIONS.Func_ID;
And here's the Update Query:
UPDATE FUNCTIONS
SET FUNCTIONS.Func_TaxRef = [Result of Select query]
Well, it looks like Access can't do aggregates in UPDATE queries. But it can do aggregates in SELECT queries. So create a query with a definition like:
SELECT func_id, min(tax_code) as MinOfTax_Code
FROM Functions
INNER JOIN Tax
ON (Functions.Func_Year = Tax.Tax_Year)
AND (Functions.Func_Pure <= Tax.Tax_ToPrice)
GROUP BY Func_Id
And save it as YourQuery. Now we have to work around another Access restriction. UPDATE queries can't operate on queries, but they can operate on multiple tables. So let's turn the query into a table with a Make Table query:
SELECT YourQuery.*
INTO MinOfTax_Code
FROM YourQuery
This stores the content of the view in a table called MinOfTax_Code. Now you can do an UPDATE query:
UPDATE MinOfTax_Code
INNER JOIN Functions ON MinOfTax_Code.func_id = Functions.Func_ID
SET Functions.Func_TaxRef = [MinOfTax_Code].[MinOfTax_Code]
Doing SQL in Access is a bit of a stretch, I'd look into Sql Server Express Edition for your project!
I wrote about some of the limitations of correlated subqueries in Access/JET SQL a while back, and noted the syntax for joining multiple tables for SQL UPDATEs. Based on that info and some quick testing, I don't believe there's any way to do what you want with Access/JET in a single SQL UPDATE statement. If you could, the statement would read something like this:
UPDATE FUNCTIONS A
INNER JOIN (
SELECT AA.Func_ID, Min(BB.Tax_Code) AS MinOfTax_Code
FROM TAX BB, FUNCTIONS AA
WHERE AA.Func_Pure<=BB.Tax_ToPrice AND AA.Func_Year= BB.Tax_Year
GROUP BY AA.Func_ID
) B
ON B.Func_ID = A.Func_ID
SET A.Func_TaxRef = B.MinOfTax_Code
Alternatively, Access/JET will sometimes let you get away with saving a subquery as a separate query and then joining it in the UPDATE statement in a more traditional way. So, for instance, if we saved the SELECT subquery above as a separate query named FUNCTIONS_TAX, then the UPDATE statement would be:
UPDATE FUNCTIONS
INNER JOIN FUNCTIONS_TAX
ON FUNCTIONS.Func_ID = FUNCTIONS_TAX.Func_ID
SET FUNCTIONS.Func_TaxRef = FUNCTIONS_TAX.MinOfTax_Code
However, this still doesn't work.
I believe the only way you will make this work is to move the selection and aggregation of the minimum Tax_Code value out-of-band. You could do this with a VBA function, or more easily using the Access DLookup function. Save the GROUP BY subquery above to a separate query named FUNCTIONS_TAX and rewrite the UPDATE statement as:
UPDATE FUNCTIONS
SET Func_TaxRef = DLookup(
"MinOfTax_Code",
"FUNCTIONS_TAX",
"Func_ID = '" & Func_ID & "'"
)
Note that the DLookup function prevents this query from being used outside of Access, for instance via JET OLEDB. Also, the performance of this approach can be pretty terrible depending on how many rows you're targeting, as the subquery is being executed for each FUNCTIONS row (because, of course, it is no longer correlated, which is the whole point in order for it to work).
Good luck!
I had a similar problem. I wanted to find a string in one column and put that value in another column in the same table. The select statement below finds the text inside the parens.
When I created the query in Access I selected all fields. On the SQL view for that query, I replaced the mytable.myfield for the field I wanted to have the value from inside the parens with
SELECT Left(Right(OtherField,Len(OtherField)-InStr((OtherField),"(")),
Len(Right(OtherField,Len(OtherField)-InStr((OtherField),"(")))-1)
I ran a make table query. The make table query has all the fields with the above substitution and ends with INTO NameofNewTable FROM mytable
Does this work? Untested but should get the point across.
UPDATE FUNCTIONS
SET Func_TaxRef =
(
SELECT Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS F1
WHERE F1.Func_Pure <= [Tax_ToPrice]
AND F1.Func_Year=[Tax_Year]
AND F1.Func_ID = FUNCTIONS.Func_ID
GROUP BY F1.Func_ID;
)
Basically for each row in FUNCTIONS, the subquery determines the minimum current tax code and sets FUNCTIONS.Func_TaxRef to that value. This is assuming that FUNCTIONS.Func_ID is a Primary or Unique key.
I did want to add one more answer that utilizes a VBA function, but it does get the job done in one SQL statement. Though, it can be slow.
UPDATE FUNCTIONS
SET FUNCTIONS.Func_TaxRef = DLookUp("MinOfTax_Code", "SELECT
FUNCTIONS.Func_ID,Min(TAX.Tax_Code) AS MinOfTax_Code
FROM TAX, FUNCTIONS
WHERE (((FUNCTIONS.Func_Pure)<=[Tax_ToPrice]) AND ((FUNCTIONS.Func_Year)=[Tax_Year]))
GROUP BY FUNCTIONS.Func_ID;", "FUNCTIONS.Func_ID=" & Func_ID)
I know this topic is old, but I thought I could add something to it.
I could not make an Update with Select query work using SQL in MS Access 2010. I used Tomalak's suggestion to make this work. I had a screenshot, but am apparently too much of a newb on this site to be able to post it.
I was able to do this using the Query Design tool, but even as I was looking at a confirmed successful update query, Access was not able to show me the SQL that made it happen. So I could not make this work with SQL code alone.
I created and saved my select query as a separate query. In the Query Design tool, I added the table I'm trying to update the the select query I had saved (I put the unique key in the select query so it had a link between them). Just as Tomalak had suggested, I changed the Query Type to Update. I then just had to choose the fields (and designate the table) I was trying to update. In the "Update To" fields, I typed in the name of the fields from the select query I had brought in.
This format was successful and updated the original table.