Conditional aggregation with JOIN?

Conditional aggregation with JOIN? - sql

I am working in Postgres 9.4. I have a table containing medications, as follows:
bnf_code │ character varying(15) │ not null
pills_per_day │ double precision │
For example, this table might contain a medication with code of 04030201, with a recommended pills per day of 4, and one with code 04030202 and recommended pills per day of 2.
And I also have a table containing numbers of prescriptions, with a foreign key to the table above:
code │ character varying(15) │ not null
num_pills │ double precision │ not null
processing_date │ date │ not null
practice_id │ character varying(6) │ not null
Foreign-key constraints:
FOREIGN KEY (code) REFERENCES medications(bnf_code) DEFERRABLE INITIALLY DEFERRED
Now I need to work out how many daily doses were prescribed for all codes starting 0403. The daily dose is defined as the number of pills actually prescribed, divided by the recommended pills per day.
I know how to do this for the two particular codes above:
SELECT (SUM(num_pills) FILTER (WHERE code='04030201') / 4) +
(SUM(num_pills) FILTER (WHERE code='04030202') / 2)
FROM prescriptions
But that's because I can hard-code in the pills per day field.
Can I extend this to divide by the appropriate pills_per_day for all codes starting 0403? There might be several hundred, but I'd prefer to use a single SQL query if possible.

I am not sure if this is what are you looking for:
SELECT SUM(p.num_pills/r.pills_per_day)
FROM prescriptions p INNER join recommendations r
ON p.code = r.bnf_code
WHERE p.code Like '0403%'
I am assuming that num_pills is the number of pills prescribed and pills_per_day is the number of pills recommended.

Related

How select multiple ex wild card tables in Clickhouse?

I used to use google bigquery and selected multiple wild card tables with query like this:
SELECT *
FROM project.dataset.events_*
WHERE _TABLE_SUFFIX BETWEEN "20220704" AND "20220731"
and it selects all tables between this two dates
Is it possible in Clickhouse to query multiple tables with _TABLE_SUFFIX or analog if i only have bunch of tables like
1. events_20210501
2. events_20210502
3. events_20210503
...
with table engine ReplicatedMergeTree?
Is it possible to create wild card analog in clickhouse?

https://clickhouse.com/docs/en/engines/table-engines/special/merge
https://clickhouse.com/docs/en/sql-reference/table-functions/merge
create table A1 Engine=Memory as select 1 a;
create table A2 Engine=Memory as select 2 a;
select * from merge(currentDatabase(), '^A.$');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘
select * from merge(currentDatabase(), 'A');
┌─a─┐
│ 1 │
└───┘
┌─a─┐
│ 2 │
└───┘

Substraction DateTime64 from DateTime64 in SQL clickhouse BD

I trying to find to calculate time difference in milliseconds betweent timestamps of two tables.
like this,
SELECT value, (table1.time - table2.time) AS time_delta
but i get error :
llegal types DateTime64(9) and DateTime64(9) of arguments of function minus:
so i can't substract DateTime64 in clickhouse.
Second way i tryed use DATEDIFF , but this func is limited by "SECONDS", i need values in "MILLISECONDS"
this is supported, but i get zeros in diff, because difference is too low(few millisecond):
SELECT value, dateDiff(SECOND , table1.time, table2.platform_time) AS time_delta
this is not supported:
SELECT value, dateDiff(MILLISECOND , table1.time, table2.time) AS time_delta
What's a better way to resolve my problem?
P.S i also tryed convert values to float, it's work , but looks strange,
SELECT value, (toFloat64(table1.time) - toFloat64(table2.time)) AS time_delta
as result i get somethink like this:
value time
51167477 -0.10901069641113281

#ditrauth Try casting to Float64, as the subsecond portion that you are looking for is stored as a decimal. Aslo, you want DateTime64(3) for milliseconds, see the docs. see below:
CREATE TABLE dt(
start DateTime64(3, 'Asia/Istanbul'),
end DateTime64(3, 'Asia/Istanbul')
)
ENGINE = MergeTree ORDER BY end
insert into dt values (1546300800123, 1546300800125),
(1546300800123, 1546300800133)
SELECT
start,
CAST(start, 'Float64'),
end,
CAST(end, 'Float64'),
CAST(end, 'Float64') - CAST(start, 'Float64') AS diff
FROM dt
┌───────────────────start─┬─CAST(start, 'Float64')─┬─────────────────────end─┬─CAST(end, 'Float64')─┬─────────────────diff─┐
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.125 │ 1546300800.125 │ 0.002000093460083008 │
│ 2019-01-01 03:00:00.123 │ 1546300800.123 │ 2019-01-01 03:00:00.133 │ 1546300800.133 │ 0.009999990463256836 │
└─────────────────────────┴────────────────────────┴─────────────────────────┴──────────────────────┴──────────────────────┘
2 rows in set. Elapsed: 0.001 sec.

Is this the correct way to use it via grafana?

ClickHouse:
┌─name──────────┬─type──────────┬
│ FieldUUID │ UUID │
│ EventDate │ Date │
│ EventDateTime │ DateTime │
│ Metric │ String │
│ LabelNames │ Array(String) │
│ LabelValues │ Array(String) │
│ Value │ Float64 │
└───────────────┴───────────────┴
Row 1:
──────
FieldUUID: 499ca963-2bd4-4c94-bc60-e60757ccaf6b
EventDate: 2021-05-13
EventDateTime: 2021-05-13 09:24:18
Metric: cluster_cm_agent_physical_memory_used
LabelNames: ['host']
LabelValues: ['test01']
Value: 104189952
Grafana:
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
no data points.
question: Is this the correct way to use it via grafana?
Example:
cluster_cm_agent_physical_memory_used{host='test01'} 104189952

Grafana expects your SQL will return time series data format for most of the visualization.
One column DateTime\Date\DateTime64 or UInt32 which describe
timestamp
One or several columns with Numeric types (Float, Int*,
UInt*) with metric values (column name will use as time series name)
optional one column with String which can describe multiple time
series name
or advanced "time series" format, when first column will timestamp, and the second column will Array(tuple(String, Numeric)) where String column will time series name (usually it used with
so, select table metrics.shell as table and EventDateTime as field in drop-down in query editor your query could be changed to
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND $timeFilter
ORDER BY
EventDateTime
SQL query from your post, can be visualized without changes only with the Table plugin and you shall change "time series" to "table" format for properly data transformation on grafana side

Analog for promQL query
cluster_cm_agent_physical_memory_used{host='test01'} 104189952
should look like
SELECT
EventDateTime,
Value AS cluster_cm_agent_physical_memory_used
FROM
$table
WHERE
Metric = 'cluster_cm_agent_physical_memory_used'
AND LabelValues[indexOf(LabelNames,'host')] = 'test01'
AND $timeFilter
ORDER BY
EventDateTime

Select where 2 columns selected are from 1 column of table but with different conditions

Am going in circles on a query and would appreciate your help as I'm very new at this. I'm using postgre sql, version: 9.5.8
What I'm trying to do:
I want to work out the percentage of partial sales over full sales.
The part I'm completely messing up is the final select, where I am selecting the same column COUNT(sale_id) twice from the table "sale", but passing COUNT(sale_id) through 2 different conditions (one with a 'WHERE' and one without) to create 2 new columns. Do I need to join these as separate tables instead?
The desired result should be a percentage.
This is what I have (but I'm of course getting a bunch of errors):
SELECT ROUND(percentage_partial_sale, 1) as "percentage_partial"
FROM (
SELECT count_partial_sale / count_all_sales as percentage_partial_sale
FROM (
SELECT COUNT(sale_id) FROM sale WHERE sale.is_partial=true as "count_partial_sale",
COUNT(sale_id) FROM sale as "total_sellouts");
If you could express the solution in layman's terms that would be helpful. Feel free to make changes as you wish.
Many thanks for your help.

Use CASE WHEN to count conditionally:
select
count(*) as all_sales,
count(case when is_partial then 1 end) as partial_sales,
count(case when is_partial then 1 end)::decimal / count(*)::decimal * 100.0 as ratio
from sale;

You can simply compute the average value of is_partial casted to integer (false is 0, true is 1):
[local] #= CREATE TABLE sale (is_partial boolean);
CREATE TABLE
[local] #= INSERT INTO sale VALUES (false), (false), (true);
INSERT 0 3
[local] #= SELECT AVG(is_partial::int) FROM sale;
┌────────────────────────┐
│ avg │
├────────────────────────┤
│ 0.33333333333333333333 │
└────────────────────────┘
(1 row)
Time: 6,012 ms
If your use case can't be done using AVG, you can use FILTER to remove rows from an aggregate function:
[local] #= SELECT COUNT(*) FILTER (WHERE is_partial) / COUNT(*) :: float
FROM sale;
┌───────────────────┐
│ ?column? │
├───────────────────┤
│ 0.333333333333333 │
└───────────────────┘
(1 row)

How to count items by day

I'm needing to make a report from our ticket database that has the number of tickets closed per day by tech. My SQL query looks something like this:
select
i.DT_CLOSED,
rp.NAME,
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
DT_CLOSED is the date and time in ISO format, and NAME is the rep name. I also have a calculated value in my dataset called TICKETSDAY that is calcualted using =DateValue(Fields!DT_CLOSED.Value) giving me the day without the time.
Right now I have a table set up that is grouped by NAME, then by TICKETSDAY, and I would like the last column to be a count of how many tickets there are. But when I set the last column to =Count(DT_CLOSED) it lists a 1 on each row for each ticket instead of aggregating so that my table looks like this:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ │ ├──────────────┤
│ │ │ 1│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
And I need it to be:
┌───────────┬───────────┬──────────────┐
│Name │Day │Tickets Closed│
├───────────┼───────────┼──────────────┤
│JOHN SMITH │11/01/2013 │ 3│
│ ├───────────┼──────────────┤
│ │11/02/2013 │ 1│
└───────────┴───────────┴──────────────┘
Any idea what I'm doing wrong? Any help would be greatly appreciated.

I believe that Marc B is correct. You need to group by the non aggregate columns in your select statement. Try something along these lines.
select
i.DT_CLOSED,
rp.NAME,
COUNT(i.ID)
from INCIDENTS i
join REPS rp on (rp.ID = i.id_assignee)
where i.DT_CLOSED > #StartDate
GROUP BY rp.NAME, i.DT_CLOSED
Without a group by to aggregate your rows together your query is counting each row distinctly. I'm unfamiliar with how the report builder works, but try adding the group by clause manually and see what you get.
Let me know if I can clarify anything.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Conditional aggregation with JOIN? - sql

Related

How select multiple ex wild card tables in Clickhouse?

Substraction DateTime64 from DateTime64 in SQL clickhouse BD

Is this the correct way to use it via grafana?

Select where 2 columns selected are from 1 column of table but with different conditions

How to count items by day

Categories

Resources