Why is "Acute HIV Infection" not classified as a sexually transmitted disease in SNOMED-CT? - icd

I'm trying to compile a list of sexually transmitted diseases using SNOMED-CT (I happen to be using OHDSI/OMOP concept tables in Databricks as my source for SNOMED-CT).
Querying the SNOMED-CT data from OHDSI Athena in OHDSI/OMOP using the query below, I get the results shown below. Notably missing from the ancestors is any indication that HIV is a sexually transmitted disease/infection.
Is there a way to use SNOMED-CT to create a somewhat comprehensive list of sexually transmitted diseases? Is there a better way to get to a list of codes (SNOMED or other, e.g. ICD) for sexually transmitted diseases?
select distinct
parent.concept_id,
parent.concept_code,
parent.vocabulary_id,
parent.concept_name,
an.max_levels_of_separation,
an.min_levels_of_separation
from
concept con
join concept_ancestor an on 1=1
and an.descendant_concept_id = con.concept_id
join concept parent on 1=1
and parent.concept_id = an.ancestor_concept_id
where 1=1
and con.vocabulary_id = 'SNOMED'
and lower(con.concept_name) = 'acute hiv infection'
and con.domain_id = 'Condition'
order by parent.concept_name
;
This finding seems to be confirmed using other SNOMED-CT browsers, for example:
https://browser.ihtsdotools.org/?perspective=full&conceptId1=62479008&edition=MAIN/SNOMEDCT-US/2022-03-01&release=&languages=en

Related

DBSQL_SQL_INTERNAL_DB_ERROR SQL error 2048

I have to join two tabled ACDOCA and BKPF. I have written the follow code for it.
SELECT a~rbukrs,
a~racct,
a~bldat,
a~blart,
a~kunnr,
a~belnr,
a~sgtxt,
b~xblnr,
a~budat,
a~hsl,
a~prctr
INTO TABLE #it_final
FROM acdoca AS a
LEFT OUTER JOIN bkpf AS b
ON a~rbukrs = b~bukrs
AND a~gjahr = b~gjahr
WHERE a~rbukrs IN #s_bukrs
AND a~Kunnr IN #s_kunnr
AND a~Budat IN #s_budat
AND a~Belnr IN #s_belnr
AND a~rldnr IN #s_rldnr
AND a~blart = 'DR' OR a~blart = 'ZK' OR a~blart = 'UE'.
Facing the following errors:----
Runtime error: DBSQL_SQL_INTERNAL_DB_ERROR
SQL error "SQL code: 2048" occurred while accessing table "ACDOCA".
Short Text: An exception has occurred in class "CX_SY_OPEN_SQL_DB"
How do I resolve this? please help.
A few things:
Selecting directly from the database tables is error prone (e.g. you'll forget keys while joining) and you have to deal with those terrible german abbreviations (e.g. Belegnummer -> belnr). Since quite some time there are CDS Views on top such as I_JournalEntryItem with associations and proper english names for those fields, if you can use them, I would (also they're C1 released).
As already pointed out by xQBert the query does probably not work as intended as AND has prescendence over OR, and as such your query basically returns everything from ACDOCA, multiplied by everything from BKPF which likely leads to the database error you've posted
With range queries you might still get a lot of results (like billions of entries, depending on your company's size), you should either limit the query with UP TO, implement some pagination or COUNT(*) first and show an error to the user if the result set is too large.
I would write that like this:
TYPES:
BEGIN OF t_filters,
company_codes TYPE RANGE OF bukrs,
customers TYPE RANGE OF kunnr,
document_dates TYPE RANGE OF budat,
accounting_documents TYPE RANGE OF fis_belnr,
ledgers TYPE RANGE OF rldnr,
END OF t_filters.
DATA(filters) = VALUE t_filters(
" filter here
).
SELECT FROM I_JournalEntryItem
FIELDS
CompanyCode,
GLAccount,
DocumentDate,
AccountingDocumentType,
Customer,
AccountingDocument,
DocumentItemText,
\_JournalEntry-DocumentReferenceID,
PostingDate,
AmountInCompanyCodeCurrency,
ProfitCenter
WHERE
CompanyCode IN #filters-company_codes AND
Customer IN #filters-customers AND
DocumentDate IN #filters-document_dates AND
AccountingDocument IN #filters-accounting_documents AND
Ledger IN #filters-ledgers AND
AccountingDocumentType IN ( 'DR', 'ZK', 'UE' )
INTO TABLE #DATA(sales_orders)
UP TO 100 ROWS.
(As a bonus you'll get proper DCL authorization checks)
2048 is/can be a memory allocation error: Too much data being returned. Given that, this line is highly suspect
AND a~blart = 'DR' OR a~blart = 'ZK' OR a~blart = 'UE'.
I'd consider this instead. Otherwise ALL blart ZK and UE records are returned regardless of customer, year, company et...
SELECT a~rbukrs,
a~racct,
a~bldat,
a~blart,
a~kunnr,
a~belnr,
a~sgtxt,
b~xblnr,
a~budat,
a~hsl,
a~prctr
INTO TABLE #it_final
FROM acdoca AS a
LEFT OUTER JOIN bkpf AS b
ON a~rbukrs = b~bukrs
AND a~gjahr = b~gjahr
WHERE a~rbukrs IN #s_bukrs
AND a~Kunnr IN #s_kunnr
AND a~Budat IN #s_budat
AND a~Belnr IN #s_belnr
AND a~rldnr IN #s_rldnr
AND a~blart IN ('DR','ZK','UE').
However, if you really did mean to return all blart ZK, UE records and only those that ar DR and in the defined parameters... you're simply asking for too much data from teh system and need to "LIMIT" your result set and somehow let the user know only a limited set is being returned due to data volume
I'd also ensure your join on keys is sufficient. Fiscal Year and company code represent an incomplete key to BKPF. I dont' know ACDOCA data table so I'm unsure if that's a proper join which may be leading to a semi-cartesean contributing to data bloat. I'd think in a multi-tenant db, you may need to join on mandt as well... possibly a doc number and some other values... again, this lookst to be an incomplete join on key.... so perhaps more is needed there as well.

ORA-01841 happens on one environment but not all

I have the following SQL-code in my (SAP IdM) Application:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
I already found the solution for the ORA-01841 problem, as it could either be a solution similar to MSSQLs try_to_date as mentioned here: How to handle to_date exceptions in a SELECT statment to ignore those rows?
or a solution where I change the code to something like this, to work soly on strings:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and v1.searchvalue<= to_char(sysdate+3,'YYYY-MM-DD')
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
So for the actually problem I have a solution.
But now I came into discussion with my customers and teammates why the error happens at all.
Basically for all entries of idmv_value_basic_activ that comply to the requirement of "attrname = 'Start_of_company_change'" we can be sure that those are dates. In addition, if we execute the query to check all values that would be delivered, all are in a valid format.
I learned in university that the DB-Engine could decide in which order it will run individual segments of a query. So for me the most logical explanation would be that, on the development environment (where we face the problem), the section " to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3” is executed before the section “attrname = 'Start_of_company_change'”
Whereas on the productive environment, where everything works like a charm, the segments are executed in the order that is descripted by the SQL Statement.
Now my Question is:
First: do I remember that right, since the teacher said that only once and at that time I could not really make sense out of it
And Second: Is this assumption of mine correct or is there another reason for the problem?
Borderinformation:
The Tool uses a kind of shifted data structure which is why there can be quite a few different types in the actual “Searchvalue” column of the idmv_value_basic_activ view. The datatype on the database layer is always a varchar one.
"the DB-Engine could decide in which order it will run individual segments of a query"
This is correct. A SQL query is just a description of the data you want and where it's stored. Oracle will calculate an execution plan to retrieve that data as best it can. That plan will vary based on any number of factors, like the number of actual rows in the table and the presence of indexes, so it will vary from environment to environment.
So it sounds like you have an invalid date somewhere in your table, so to_date raises an exception. You can use validate_conversion to find it.

How to use SparkR::read.jdbc() or sparklyr::spark_read_jdbc() to get results of SQL query rather than whole table?

I usually use RODBC locally to query my databases. However our company has recently moved to Azure Databricks which does not inherently support RODBC or other odbc connections, but does support jdbc connections which I have not previously used.
I have read the documentation for SparkR::read.jdbc() and sparklyr::spark_read_jdbc() but these seem to pull an entire table from the database rather than just the results of a query, which is not suitable for me as I never have to pull whole tables and instead run queries that join multiple tables together but only return a very small subset of the data in each table.
I cannot find a method for using the jdbc connector to:
(A) run a query referring to multiple tables on the same database
and
(B) store the results as an R dataframe or something that can very easily be converted to an R dataframe (such as a SparkR or sparklyr dataframe).
If possible, the solution would also only require me to specify the connection credentials once per script/notebook rather than every time I connect to the database to run a query and store the results as a dataframe.
e.g. is there a jdbc equivalent of the following:
my_server="myserver.database.windows.net"
my_db="mydatabase"
my_username="database_user"
my_pwd="abc123Ineedabetterpassword"
myconnection <- RODBC::odbcDriverConnect(paste0("DRIVER={SQL Server};
server=",my_server,";
database=",my_db,";
uid=",my_username,";
pwd=",my_pwd))
df <- RODBC::sqlQuery(myconnection,
"SELECT a.var1, b.var2, SUM(c.var3) AS Total_Things, AVG(d.var4) AS Mean_Stuff
FROM table_A as a
JOIN table_B as b on a.id = b.a_id
JOIN table_C as c on a.id = c.a_id
JOIN table_D as d on c.id = d.c_id
Where a.filter_var IN (1, 2, 3, 4)
AND d.filter_var LIKE '%potatoes%'
GROUP BY
a.var1, b.var2
")
df2 <- RODBC::sqlQuery(myconnection,
"SELECT x.var1, y.var2, z.var3
FROM table_x as x
LEFT JOIN table_y as y on x.id = y.x_id
LEFT JOIN table_z on as z on x.id = z.x_id
WHERE z.category like '%vegetable%'
AND y.category IN ('A', 'B', 'C')
“)
How would I do something that gives the same results (two R dataframes df and df2) as the above using the jdbc connectors from SparkR or sparklyr inbuilt in Databricks?
I know that I can use the spark connector and some scala code (https://learn.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector) to store the query results as a spark dataframe, convert this to a global temp table, store the global temp table as a SparkR dataframe and collapse this to an R dataframe, but this code is very difficult to read, requires me to change the language to scala (which I do not know well) for one of the cells in my notebook, and takes a really long time due to the large amount of steps. Because my R script often starts with several SQL queries -- often to multiple different databases -- this method gets very time-consuming and makes my scripts almost unreadable. Surely there is a more straightforward way?
(We are using Databricks primarily for automation via LogicApps and Azure Data Factory, and occasionally for increased RAM, rather than for parallel processing; our data (once extracted) are generally not large enough to require parallelisation and some of the models we use (e.g. lme4::lmer()) do not benefit from it.)
I worked this out eventually and want to post the answer here in case anyone else is having issues.
You can use SparkR::read.jdbc() with a query but you must surround the query in brackets and alias the results as something, otherwise you will get an ambiguous syntax error. The "portnum" seems to work fine for me as the default 1433 but if you have a different kind of SQL database you might need to change this in the URL. Then you can call SparkR::collect() on the SparkDataFrame containing the query results to convert it to an R dataframe:
e.g.
myconnection <- "jdbc:sqlserver://myserver.database.windows.net:1433;database=mydatabase;user=database_user;password=abc123Ineedabetterpassword"
df <- read.jdbc( myconnection, "(
SELECT a.var1, b.var2, SUM(c.var3) AS Total_Things, AVG(d.var4) AS Mean_Stuff
FROM table_A as a
JOIN table_B as b on a.id = b.a_id
JOIN table_C as c on a.id = c.a_id
JOIN table_D as d on c.id = d.c_id
Where a.filter_var IN (1, 2, 3, 4)
AND d.filter_var LIKE '%potatoes%'
GROUP BY
a.var1, b.var2) as result" ) %>%
SparkR::collect()

How to get a query definition from Cognos?

Is it possible to view the SQL used in Cognos's queries?
e.g. To get the XML definition of a report you can use the below SQL (copied from https://stackoverflow.com/a/24335760/361842):
SELECT CMOBJNAMES.NAME AS ObjName
, CMOBJECTS.PCMID
, CMCLASSES.NAME AS ClassName
, cast(CMOBJPROPS7.spec as xml) ReportDefinition
FROM CMOBJECTS
INNER JOIN CMOBJNAMES ON CMOBJECTS.CMID = CMOBJNAMES.CMID
INNER JOIN CMCLASSES ON CMOBJECTS.CLASSID = CMCLASSES.CLASSID
LEFT OUTER JOIN CMOBJPROPS7 ON CMOBJECTS.CMID = CMOBJPROPS7.CMID
WHERE CMOBJECTS.CLASSID IN (10, 37)
ORDER BY CMOBJECTS.PCMID;
... and from that XML you can often find sqltext elements giving the underlying SQL. However, where existing queries are being used it's hard to see where that data's coming from.
I'd like the equivalent of the above SQL to find Query definitions; though so far have been unable to find any such column.
Failing that, is there a way to find this definition through the UI? I looked under Query Studio and found the query's lineage which gives some information about the query columns, but doesn't make the data's source clear.
NB: By query I'm referring to those such as R5BZDDAN_GRAPH in the below screenshot from Query Studio:
... which would be referred to in a Cognos report in a way such as:
<query name="Q_DEMO">
<source>
<model/>
</source>
<selection autoSummary="false">
<dataItem aggregate="none" name="REG_REG" rollupAggregate="none">
<expression>[AdvRepData].[Q_R5BZDDAN_GRAPH].[REG_REG]</expression>
</dataItem>
<dataItem aggregate="none" name="REG_ORG" rollupAggregate="none">
<expression>[AdvRepData].[Q_R5BZDDAN_GRAPH].[REG_ORG]</expression>
</dataItem>
<!-- ... -->
UPDATE
For the benefit of others, here's an amended version of the above code for pulling back report definitons:
;with recurse
as (
select Objects.CMID Id, ObjectClasses.Name Class, ObjectNames.NAME Name
, cast('CognosObjects' as nvarchar(max)) ObjectPath
from CMOBJECTS Objects
inner join CMOBJNAMES ObjectNames
on ObjectNames.CMID = Objects.CMID
and ObjectNames.IsDefault = 1 --only get 1 result per object (could filter on language=English (LocaleId=24 / select LocaleId from CMLOCALES where Locale = 'en'))
inner join CMCLASSES ObjectClasses on ObjectClasses.CLASSID = Objects.CLASSID
where Objects.PCMID = objects.CMID --cleaner than selecting on root since not language sensitive
--where ObjectClasses.NAME = 'root'
union all
select Objects.CMID Id, ObjectClasses.Name Class, ObjectNames.NAME Name
, r.ObjectPath + '\' + ObjectNames.NAME ObjectPath --I use a backslash rather than forward slash as using this to build a windows path
from recurse r
inner join CMOBJECTS Objects
on objects.PCMID = r.Id
and Objects.PCMID != objects.CMID --prevent ouroboros
inner join CMOBJNAMES ObjectNames
on ObjectNames.CMID = Objects.CMID
and ObjectNames.IsDefault = 1 --only get 1 result per object (could filter on language=English (LocaleId=24 / select LocaleId from CMLOCALES where Locale = 'en'))
inner join CMCLASSES ObjectClasses
on ObjectClasses.CLASSID = Objects.CLASSID
)
select *
from recurse
where Class in ('report','query')
order by ObjectPath
Terminology:
Query Subject can be considered a table
Query Item can be considered a column
For your example the SQL might be defined in the R5BZDDAN_GRAPH query subject which is in turn defined in the Framework Manager model. The framework manager model is defined in a .cpf file which isn't in the content store at all. (it is an XML file though). This file is 'published' to Cognos to make packages.
There is also a cached version of the framework manager file on the actual cognos server (a .cqe file) although it is generally not recommended to rely on this
I say your SQL might be defined. If the query subject is a SQL query subject then that is where it is defined. If If the query subject is a model query subject then it is just a list of query items from other query subjects. These might be from many other query subjects which then have joins defined in Framework Manager. So there is no actual SQL defined there - it gets generated at run time
I'm not sure of your end requirement but there are three other ways to get SQL:
In Report Studio you can 'show generated SQL' on each query
In Framework Manager you can select one or more query subjects and show generated SQL
You can use a monitoring tool on your database to see what SQL is being submitted
If you just want to know how numbers are generated in your report, the most direct solution is to monitor your database.
Lastly keep in mind that in some rare cases, SQL defined in Framework Manager might be altered by the way the report is written

Need to return multiple entries from a single field in One Table

So Here is the problem I have a requirement where I need a customer type to equal two different things.
To Cover the requirement I don't need the customer type to equal Client, or Non client but equal Client, and Non_Client. Each Customer_No can have multiple Customer Types
Here is an example of what I have worked on so far. If you know a better way of optimizing this as well as solving the problem please let me know.
The out put should look like this
CustomerID CustomerType CustomerType
--------------------------------------
2345 Client NonClient
Select TB1.Customer_ID, IB1.Customer_Type, AS Non_client IB1.Customer_Type AS Client
From Client TB1, Client_ReF XB1, Client_Instr IB1, Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1. Client_NO
AND IB1.Client = 'Client'
AND IB1.Non_Client = 'NonClient'
I have omitted a few other filters that I felt were unnecessary. This also may not make sense, but I tried to change up the names of stuff as to keep myself in compliance.
First a small syntactic error:
You mustn't have a comma before the "AS Non_client "
Then what you are trying to do is make 1 value equal 2 different things for the same column which can never be true:
IB1.Customer_Type for 1 record can never be equal to "Client" and "NonClient" simultaneously.
The key here is that 1 customer can have multiple records and the records can differ in the customer_type. So to use that we need to join those records together which is easy since they share a Customer_ID:
Select TB1.Customer_ID,
IB1.Customer_Type AS Client,
IB2.Customer_Type AS Non_client
From Client TB1,
Client_ReF XB1,
Client_Instr IB1,
Client_Instr IB2,
Client_XREC FB1
Where XB1.Client_NO = TB1.Client_NO
AND FB1.Client_ACCT = TB1.ACCT
AND XB1.Client_Instruct_NO = IB1.Client_Instruct_NO
AND FB1.Customer_ID= TB1.Client_NO
AND IB1.Client = 'Client'
AND XB1.Client_Instruct_NO = IB2.Client_Instruct_NO
AND IB2.Non_Client = 'NonClient';
The above may not actually work due to me not fully understanding your data and structures but should put you on the right path. Particularly around the join of IB2 with XB1, you might have to join IB2 with all the same tables as IB1.
A better way than that however, and i'll leave you to research it, is using the EXISTS statement. The difference is that the above will join all records for the same customer together whereas EXISTS will just be satisfied if there's at least 1 instance of a "NonClient" record.