Creating a calculated column (uses aggregate function) in MS Access table - sql

The MS Access SQL query shown below produces a result set. I want to take the completion_metric_query_result column from the result and add it to an actual table as a computed column. Is there any way I can do this?
Previously I attempted to use an update query but update queries in access do not work with aggregate function like AVG().
SELECT
PROMIS_LT_Long_ID.Short_ID,
FORMAT (100 * (AVG(IIF(PROMIS_LT_Long_ID.Status = 'Complete', 1.0, 0))), '#,##0.00') AS completion_metric_query_result
FROM
PROMIS_LT
INNER JOIN
PROMIS_LT_Long_ID ON PROMIS_LT.SubjectID = PROMIS_LT_Long_ID.Short_ID
GROUP BY
PROMIS_LT_Long_ID.Short_ID;
Result set:
SubjectID | completion_metric_query_result
----------+-------------------------------
02345800 | 12.00
13938432 | 12.50
13491349 | 0.00
12484028 | 15.00
12993248 | 75.00

Related

Issue displaying empty value of repeated columns in Google Data Studio

I've got an issue when trying to visualize in Google Data Studio some information from a denormalized table.
Context: I want to gather all the contact of a company and there related orders in a table in Big Query. Contacts can have no order or multiple orders. Following Big Query best practice, this table is denormalized and all the orders for a client are in arrays of struct. It looks like this:
Fields Examples:
+-------+------------+-------------+-----------+
| Row # | Contact_Id | Orders.date | Orders.id |
+-------+------------+-------------+-----------+
|- 1 | 23 | 2019-02-05 | CB1 |
| | | 2020-03-02 | CB293 |
|- 2 | 2321 | - | - |
|- 3 | 77 | 2010-09-03 | AX3 |
+-------+------------+-------------+-----------+
The issue is when I want to use this table as a data source in Data Studio.
For instance, if I build a table with Contact_Id as dimension, everything is fine and I can see all my contacts. However, if I add any dimensions from the Orders struct, all info from contact with no orders are not displayed. For instance, all info from Contact_Id 2321 is removed from the table.
Have you find any workaround to visualize these empty arrays (for instance as null values)?
The only solution I've found is to build an intermediary table with the orders unnested.
The way I've just discovered to work around this is to add an extra field in my DS-> BQ connector:
ARRAY_LENGTH(fields.orders) AS numberoforders
This will return zero if the array is empty - you can then create calculated fields within DataStudio - using the "numberoforders" field to force values to NULL or zero.
You can fix this behaviour by changing a little your query on the BigQuery connector.
Instead of doing this:
SELECT
Contact_id,
Orders
FROM myproject.mydataset.mytable
try this:
SELECT
Contact_id,
IF(ARRAY_LENGTH(Orders) > 0, Orders, [STRUCT(CAST(NULL AS DATE) AS date, CAST(NULL AS STRING) AS id)]) AS Orders
FROM myproject.mydataset.mytable
This way you are forcing your repeated field to have, at least, an array containing NULL values and hence Data Studio will represent those missing values.
Also, if you want to create new calculated fields using one of the nested fields, you should check before if the value is NULL to avoid filling all NULL values. For example, if you have a repeated and nested field which can be 1 or 0, and you want to create a calculated field swaping the value, you should do:
IF(myfield.key IS NOT NULL, IF(myfield.key = 1, 0, 1), NULL)
Here you can see what happens if you check before swaping and if you don't:
Original value No check Check
1 0 0
0 1 1
NULL 1 NULL
1 0 0
NULL 1 NULL

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

using the same aggregate function in the same having clause mutliple times

I am trying to filter data I get from a group by query using "having". For this I need to use the same aggregate function (min(x) and max(x)) multiple times in the having clause. This does not work, I get an error message
"Orig exception: Code: 184, e.displayText() = DB::Exception: Aggregate function min(strike) is found inside another aggregate function in query, e.what() = DB::Exception"
I tried to define the min and max in the select clause at the top as min_strike and max_strike and use it in the having clause. This does also not work.
Edit: Edited the code, forgot to add the condition.
select
<<structureid and some other variables>>
from
<<a huge query, including multiple joins>>
group by structureid
having min(strike)!=max(strike)
and sumIf(notional,strike=min(strike))>0
The idea is to apply the filters exactly at that point where its grouped.
For example:
structure id | strike | notional
1 | 1.10 | 10
1 | 1.12 | -10
2 | 1.10 | -10
2 | 1.12 | 10
3 | 1.10 | -10
3 | 1.10 | 10
In this case, I want to filter out structureid=2 and only select structureid=1, so that my result table would look like this:
structure id | min strike | max strike
1 | 1.10 | 1.12
Edit: I am doing it with some kind of "self join" now, which does not seem to be implemented in clickhouse either, so I end up doing
<<a huge query, including multiple joins>>
twice. One of those tables I group by structure id and calculate all the aggregate values. The other is not grouped. I then join them via structureid (using 'all inner join') and I apply my filter on the combined table:
select final variables
(
select
<<a lot of unaggregated variables>>
from
<<a huge query, including multiple joins>>
group by structureid
)
all inner join
(
select
<<min(strike) as min_strike, max(strike) as max_strike, and other
aggregate variables>>
from
<<a huge query, including multiple joins>>
group by structureid
)
using structureid
where min_strike!=max_strike and (notional>0 and strike=min_strike)
This seems extremely inefficient. But I cannot think of another solution. Any idea how to implement this better would be very much appreciated!

SQL Statement rows to Columns

I have a Table in MS Access like this:
The Columns are:
--------------------------------------
| *Date* | *Article* | *Distance* | Value |
---------------------------------------
Date, Article and Distance are Primary Keys, so the combination of them is always unique.
The column Distance has discrete values from 0 to 27.
I need to transform this table into a table like this:
----------
| *Date* | *Article* | Value from Distance 0| Value Dis. 1|...|Value Dis. 27|
----------
I really don't know a SQL Statement for this task. I needed a really fast solution which is why I wrote an Excel macro which worked fine but was very inefficient and needed several hours to complete. Now that the amount of data is 10 times higher, I can't use this macro anymore.
You can try the following pivot query:
SELECT
Date,
Article,
MAX(IIF(Distance = 0, Value, NULL)) AS val_0,
MAX(IIF(Distance = 1, Value, NULL)) AS val_1,
...
MAX(IIF(Distance = 27, Value, NULL)) AS val_27
FROM yourTable
GROUP BY
Date,
Article
Note that Access does not support CASE expressions, but it does offer a function called IIF() which takes the form of:
IIF(condition, value if true, value if false)
which essentially behaves the same way as CASE in other RDBMS.

How do you replace nulls in a crosstab query with zeroes?

Based on the following SQL in Access...
TRANSFORM Sum([Shape_Length]/5280) AS MILES
SELECT "ONSHORE" AS Type, Sum(qry_CurYrTrans.Miles) AS [Total Of Miles]
FROM qry_CurYrTrans
GROUP BY "ONSHORE"
PIVOT qry_CurYrTrans.QComb IN ('1_HCA_PT','2_HCA_PT','3_HCA_PT','4_HCA_PT');
... my results returned the following datasheet:
| Type | Total Of Miles | 1_HCA_PT | 2_HCA_PT | 3_HCA_PT | 4_HCA_PT |
| ONSHORE | 31.38 | | 0.30 | 7.80 | |
This result is exactly what I want except I want to see zeroes in the cells that are null.
What are some options for doing this? If possible, I'd like to avoid using a subquery. I'd also prefer the query to remain editable in Access' Design View.
I think you have to use the Nz function, which will allow you to convert NULLs to another value. In this case, I used the (optional) part of the function to say, "If Sum([Shape_Length]/5280) is NULL, set it to 0". You may have to use quotes around the 0, I can't recall.
TRANSFORM Nz(Sum([Shape_Length]/5280), 0) AS MILES
SELECT "ONSHORE" AS Type, Sum(qry_CurYrTrans.Miles) AS [Total Of Miles]
FROM qry_CurYrTrans
GROUP BY "ONSHORE"
PIVOT qry_CurYrTrans.QComb IN ('1_HCA_PT','2_HCA_PT','3_HCA_PT','4_HCA_PT');