sql query for apache airflow

sql query for apache airflow - sql

i would like to query the apache airflow database directly for a report of failed tasks, but i'm struggling with the appropriate join to make in the database.
what i would like is an output consisting of the following columns
dag_run.dag_id
dag_run.run_id
dag_run.state
dag_run.conf
task_instance.task_id
task_id.state
basically a sql dump of all dag_runs and the status of their tasks; similar to the 'Graph' view but all all run_ids.
thanks!

Here is the query for information about your failed task.
SELECT dr.dag_id, dr.run_id, dr.state, dr.conf, ti.task_id, ti.state
FROM dag_run as dr
INNER JOIN
(
SELECT dag_id, task_id, state, execution_date
FROM task_instance
WHERE state = 'failed'
) as ti
ON dr.dag_id = ti.dag_id AND dr.execution_date = ti.execution_date
I don't know what you want, but you can use DAG's on_failure_callback parameter to set it to do something when it fails. I recommend using this method.

Related

ORA-01841 happens on one environment but not all

I have the following SQL-code in my (SAP IdM) Application:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
I already found the solution for the ORA-01841 problem, as it could either be a solution similar to MSSQLs try_to_date as mentioned here: How to handle to_date exceptions in a SELECT statment to ignore those rows?
or a solution where I change the code to something like this, to work soly on strings:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and v1.searchvalue<= to_char(sysdate+3,'YYYY-MM-DD')
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
So for the actually problem I have a solution.
But now I came into discussion with my customers and teammates why the error happens at all.
Basically for all entries of idmv_value_basic_activ that comply to the requirement of "attrname = 'Start_of_company_change'" we can be sure that those are dates. In addition, if we execute the query to check all values that would be delivered, all are in a valid format.
I learned in university that the DB-Engine could decide in which order it will run individual segments of a query. So for me the most logical explanation would be that, on the development environment (where we face the problem), the section " to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3” is executed before the section “attrname = 'Start_of_company_change'”
Whereas on the productive environment, where everything works like a charm, the segments are executed in the order that is descripted by the SQL Statement.
Now my Question is:
First: do I remember that right, since the teacher said that only once and at that time I could not really make sense out of it
And Second: Is this assumption of mine correct or is there another reason for the problem?
Borderinformation:
The Tool uses a kind of shifted data structure which is why there can be quite a few different types in the actual “Searchvalue” column of the idmv_value_basic_activ view. The datatype on the database layer is always a varchar one.

"the DB-Engine could decide in which order it will run individual segments of a query"
This is correct. A SQL query is just a description of the data you want and where it's stored. Oracle will calculate an execution plan to retrieve that data as best it can. That plan will vary based on any number of factors, like the number of actual rows in the table and the presence of indexes, so it will vary from environment to environment.
So it sounds like you have an invalid date somewhere in your table, so to_date raises an exception. You can use validate_conversion to find it.

Build query that brings only sessions that have only errors?

I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!

It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines

SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?

"Error: Field '[REDACTED].field_id' not found on either side of the JOIN", Google BigQuery

As a beginner to Google's BigQuery platform, I have found it almost similar to MySql regarding its syntax. However, I am receiving an issue with my query where it is not finding a column on either side of the Inner Join I am performing.
A sample query below:
SELECT
base_account.random_table_name_transaction.context_id,
base_account.random_table_name_transaction.transaction_id,
base_account.random_table_name_transaction.meta_recordDate,
base_account.random_table_name_transaction.transaction_total,
base_account.random_table_name_transaction.view_id,
base_account.random_table_name_view.user_id,
base_account.random_table_name_view.view_id,
base_account.random_table_name_view.new_vs_returning,
base_account.random_table_name_experience.view_id,
base_account.random_table_name_experience.experienceId,
base_account.random_table_name_experience.experienceName,
base_account.random_table_name_experience.variationName,
base_account.random_table_name_experience.iterationId,
base_account.random_table_name_experience.isControl
FROM
[base_account.random_table_name_transaction] transactiontable
INNER JOIN
base_account.random_table_name_view viewtable
ON
transactiontable.view_id=viewtable.view_id
INNER JOIN
[base_account.random_table_name_experience] experiencetable
ON
viewtable.view_id=experiencetable.view_id
WHERE experiencetable.experienceId = 96659 or experiencetable.experienceId = 96660
In this case, when I run it within the BigQuery platform, after a few seconds of the query running I am returned an error:
"Error: Field 'base_account.random_table_name_experience.experienceId' not found on either side of the JOIN".
However, when I run the same query however I perform a SELECT * query, it does execute properly and returns the data I expect.
Is there something missing with my syntax as to why it is failing? I can confirm that each column I am trying to return does exist in each respected table.

Make sure to use standard SQL for your query to avoid some of the surprising aliasing rules with legacy SQL and to get more informative error messages. Your query would be:
#standardSQL
SELECT
base_account.random_table_name_transaction.context_id,
base_account.random_table_name_transaction.transaction_id,
base_account.random_table_name_transaction.meta_recordDate,
base_account.random_table_name_transaction.transaction_total,
base_account.random_table_name_transaction.view_id,
base_account.random_table_name_view.user_id,
base_account.random_table_name_view.view_id,
base_account.random_table_name_view.new_vs_returning,
base_account.random_table_name_experience.view_id,
base_account.random_table_name_experience.experienceId,
base_account.random_table_name_experience.experienceName,
base_account.random_table_name_experience.variationName,
base_account.random_table_name_experience.iterationId,
base_account.random_table_name_experience.isControl
FROM
`base_account.random_table_name_transaction` transactiontable
INNER JOIN
base_account.random_table_name_view viewtable
ON
transactiontable.view_id=viewtable.view_id
INNER JOIN
`base_account.random_table_name_experience` experiencetable
ON
viewtable.view_id=experiencetable.view_id
WHERE experiencetable.experienceId = 96659 OR experiencetable.experienceId = 96660;
Note that the only changes I made were to put the #standardSQL at the start (to enable standard SQL) and to escape the table names with backticks rather than brackets.

How to get a query definition from Cognos?

Is it possible to view the SQL used in Cognos's queries?
e.g. To get the XML definition of a report you can use the below SQL (copied from https://stackoverflow.com/a/24335760/361842):
SELECT CMOBJNAMES.NAME AS ObjName
, CMOBJECTS.PCMID
, CMCLASSES.NAME AS ClassName
, cast(CMOBJPROPS7.spec as xml) ReportDefinition
FROM CMOBJECTS
INNER JOIN CMOBJNAMES ON CMOBJECTS.CMID = CMOBJNAMES.CMID
INNER JOIN CMCLASSES ON CMOBJECTS.CLASSID = CMCLASSES.CLASSID
LEFT OUTER JOIN CMOBJPROPS7 ON CMOBJECTS.CMID = CMOBJPROPS7.CMID
WHERE CMOBJECTS.CLASSID IN (10, 37)
ORDER BY CMOBJECTS.PCMID;
... and from that XML you can often find sqltext elements giving the underlying SQL. However, where existing queries are being used it's hard to see where that data's coming from.
I'd like the equivalent of the above SQL to find Query definitions; though so far have been unable to find any such column.
Failing that, is there a way to find this definition through the UI? I looked under Query Studio and found the query's lineage which gives some information about the query columns, but doesn't make the data's source clear.
NB: By query I'm referring to those such as R5BZDDAN_GRAPH in the below screenshot from Query Studio:
... which would be referred to in a Cognos report in a way such as:
<query name="Q_DEMO">
<source>
<model/>
</source>
<selection autoSummary="false">
<dataItem aggregate="none" name="REG_REG" rollupAggregate="none">
<expression>[AdvRepData].[Q_R5BZDDAN_GRAPH].[REG_REG]</expression>
</dataItem>
<dataItem aggregate="none" name="REG_ORG" rollupAggregate="none">
<expression>[AdvRepData].[Q_R5BZDDAN_GRAPH].[REG_ORG]</expression>
</dataItem>
<!-- ... -->
UPDATE
For the benefit of others, here's an amended version of the above code for pulling back report definitons:
;with recurse
as (
select Objects.CMID Id, ObjectClasses.Name Class, ObjectNames.NAME Name
, cast('CognosObjects' as nvarchar(max)) ObjectPath
from CMOBJECTS Objects
inner join CMOBJNAMES ObjectNames
on ObjectNames.CMID = Objects.CMID
and ObjectNames.IsDefault = 1 --only get 1 result per object (could filter on language=English (LocaleId=24 / select LocaleId from CMLOCALES where Locale = 'en'))
inner join CMCLASSES ObjectClasses on ObjectClasses.CLASSID = Objects.CLASSID
where Objects.PCMID = objects.CMID --cleaner than selecting on root since not language sensitive
--where ObjectClasses.NAME = 'root'
union all
select Objects.CMID Id, ObjectClasses.Name Class, ObjectNames.NAME Name
, r.ObjectPath + '\' + ObjectNames.NAME ObjectPath --I use a backslash rather than forward slash as using this to build a windows path
from recurse r
inner join CMOBJECTS Objects
on objects.PCMID = r.Id
and Objects.PCMID != objects.CMID --prevent ouroboros
inner join CMOBJNAMES ObjectNames
on ObjectNames.CMID = Objects.CMID
and ObjectNames.IsDefault = 1 --only get 1 result per object (could filter on language=English (LocaleId=24 / select LocaleId from CMLOCALES where Locale = 'en'))
inner join CMCLASSES ObjectClasses
on ObjectClasses.CLASSID = Objects.CLASSID
)
select *
from recurse
where Class in ('report','query')
order by ObjectPath

Terminology:
Query Subject can be considered a table
Query Item can be considered a column
For your example the SQL might be defined in the R5BZDDAN_GRAPH query subject which is in turn defined in the Framework Manager model. The framework manager model is defined in a .cpf file which isn't in the content store at all. (it is an XML file though). This file is 'published' to Cognos to make packages.
There is also a cached version of the framework manager file on the actual cognos server (a .cqe file) although it is generally not recommended to rely on this
I say your SQL might be defined. If the query subject is a SQL query subject then that is where it is defined. If If the query subject is a model query subject then it is just a list of query items from other query subjects. These might be from many other query subjects which then have joins defined in Framework Manager. So there is no actual SQL defined there - it gets generated at run time
I'm not sure of your end requirement but there are three other ways to get SQL:
In Report Studio you can 'show generated SQL' on each query
In Framework Manager you can select one or more query subjects and show generated SQL
You can use a monitoring tool on your database to see what SQL is being submitted
If you just want to know how numbers are generated in your report, the most direct solution is to monitor your database.
Lastly keep in mind that in some rare cases, SQL defined in Framework Manager might be altered by the way the report is written

SQL Server correlated queries

I currently have some SQL code which looks like this.
SELECT
WORKORDER_BASE_ID As 'Work Order',
SCHED_START_DATE AS 'Scheduled Start',
SETUP_HRS AS 'Approx Setup Hrs',
RUN_HRS AS 'Approx Run Hrs',
(
SELECT
PART_ID as "Components"
FROM
REQUIREMENT
WHERE
REQUIREMENT.WORKORDER_BASE_ID = OPERATION.WORKORDER_BASE_ID
AND REQUIREMENT.WORKORDER_SUB_ID = OPERATION.WORKORDER_SUB_ID
AND PART_ID LIKE '%PSU%'
) AS PSU
FROM
OPERATION
WHERE
OPERATION.STATUS = 'R'
AND RESOURCE_ID LIKE '%{Root Container.equipmentName}%'
I am receiving errors because the sub-query generates more than one field. What I need is a way to specify a condition to the sub-query to only display data pertaining to a the specific work order in that row, specifically the work order number. I'm guessing this is some sort of loop function.
Any suggestions?
BTW, the bottom line is correct. The platform I use interprets values in braces as local variables.

A Sub-Query in select MUST return a scalar value. Use TOP 1 in your select and it should fix the issue. Or Alternately you can do the following ....
SELECT OPERATION.WORKORDER_BASE_ID AS [Work Order]
,OPERATION.SCHED_START_DATE AS [Scheduled Start]
,OPERATION.SETUP_HRS AS [Approx Setup Hrs]
,OPERATION.RUN_HRS AS [Approx Run Hrs]
,REQUIREMENT.PART_ID AS [Components]
FROM OPERATION
INNER JOIN REQUIREMENT ON REQUIREMENT.WORKORDER_BASE_ID = OPERATION.WORKORDER_BASE_ID
AND REQUIREMENT.WORKORDER_SUB_ID = OPERATION.WORKORDER_SUB_ID
WHERE OPERATION.[STATUS] ='R'
AND OPERATION.RESOURCE_ID LIKE '%{Root Container.equipmentName}%'
AND REQUIREMENT.PART_ID LIKE '%PSU%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas