Average of a column with joins

Average of a column with joins - sql

I am trying to get the average of the time column and I am using few relationships to filter them, the problem is that some rows are being duplicated as they have two or more messages. Considering that I have the following tables and rows (PostgreSQL 9.4).
|------------------------------|-----------|
| Times Table |
|----------|-------------------|-----------|
| time | conversation | first |
|----------|-------------------|-----------|
| 81250 | 1 | true |
|----------|-------------------|-----------|
| 63457 | 2 | true |
|----------|-------------------|-----------|
| 31592 | 3 | true |
|----------|-------------------|-----------|
| 33987 | 4 | true |
|----------|-------------------|-----------|
|------------------------|
| Conversations Table |
|----------|-------------|
| id | active |
|----------|-------------|
| 1 | true |
|----------|-------------|
| 2 | true |
|----------|-------------|
| 3 | true |
|----------|-------------|
| 4 | true |
|----------|-------------|
|--------------------------------------------|
| Messages Table |
|----------|-------------------|-------------|
| id | conversation | agent |
|----------|-------------------|-------------|
| 33 | 1 | 31181 |
|----------|-------------------|-------------|
| 37 | 2 | 17782 |
|----------|-------------------|-------------|
| 41 | 3 | 53132 |
|----------|-------------------|-------------|
| 44 | 3 | 53132 |
|----------|-------------------|-------------|
| 59 | 4 | 94282 |
|----------|-------------------|-------------|
And I'm trying to write a query that will return the average of time. So for the data above, the output from the query would look like:
|--------------------------------------------|
| Average Time |
|--------------------------------------------|
| 52571 |
|--------------------------------------------|
81250 + 63457 + 31592 + 33987 = 210286 / 4 = 52571 (aprox.)
This is my current query (something is wrong):
SELECT AVG("Times"."time") FROM "Times" AS "Average Time"
INNER JOIN "Conversations" ON "Conversations"."id" = "Times"."conversation" AND "Conversations"."active" = true
INNER JOIN "Messages" ON "Messages"."conversation" = "Conversations"."id" AND "Messages"."agent" IN ('31181', '17782', '53132', '94282')
WHERE "Times"."first" = true;
This is giving me the following output:
|--------------------------------------------|
| Average Time |
|--------------------------------------------|
| 48375 |
|--------------------------------------------|
81250 + 63457 + 31592 + 31592 + 33987 = 210286 / 5 = 48375 (aprox.)
I am trying to use distinct, groups and a few other aggregation functions but I fail. Here is a sqlfiddle with an example.
http://sqlfiddle.com/#!9/5833fe/8

You can use a nested subquery to avoid duplicates:
SELECT AVG(Times.time) FROM Times
WHERE Times.conversation IN (
SELECT Conversations.id FROM Conversations
INNER JOIN Messages ON Messages.conversation = Conversations.id
AND Messages.agent IN (31181, 17782, 53132, 94282)
WHERE Conversations.active = true)
AND Times.active = true;
http://sqlfiddle.com/#!17/5833f/5

Related

Select all rows where rows in another joined table match condition

So I want to select all rows where a subset of rows in another table match the given values.
I have following tables:
Main Profile:
+----+--------+---------------+---------+
| id | name | subprofile_id | version |
+----+--------+---------------+---------+
| 1 | Main 1 | 4 | 1 |
| 2 | Main 1 | 5 | 2 |
| 3 | Main 2 | ... | 1 |
+----+--------+---------------+---------+
Sub Profile:
+---------------+----------+
| subprofile_id | block_id |
+---------------+----------+
| 4 | 6 |
| 4 | 7 |
| 5 | 8 |
| 5 | 9 |
+---------------+----------+
Block:
+----------+-------------+
| block_id | property_id |
+----------+-------------+
| 7 | 10 |
| 7 | 11 |
| 7 | 12 |
| 7 | 13 |
| 8 | 14 |
| 8 | 15 |
| 8 | 16 |
| 8 | 17 |
| ... | ... |
+----------+-------------+
Property:
+----+--------------------+--------------------------+
| id | name | value |
+----+--------------------+--------------------------+
| 10 | Description | XY |
| 11 | Responsible person | Mr. Smith |
| 12 | ... | ... |
| 13 | ... | ... |
| 14 | Description | XY |
| 15 | Responsible person | Mrs. Brown |
| 16 | ... | ... |
| 17 | ... | ... |
+----+--------------------+--------------------------+
The user can define multiple conditions on the property table. For example:
Description = 'XY'
Responsible person = 'Mr. Smith'
I need all 'Main Profiles' with the highest version which have ALL matching properties and can have more of course which do not match.
It should be doable in JPA because i would translate it into QueryDSL to build typesafe, dynamic queries with the users input.
I already searched trough all questions regarding similar problems but couldn't project the answer onto my problem.
Also, I've already tried to write a query which worked quite good but retrieved all rows with at least one matching condition. Therefore i need all properties in my set but it only fetched (fetch join, which is missing in my code examplte) the matching ones.
from MainProfile as mainProfile
left join mainProfile.subProfile as subProfile
left join subProfile.blocks as block
left join block.properties as property
where mainProfile.version = (select max(mainProfile2.version)from MainProfile as mainProfile2 where mainProfile2.name = mainProfile.name) and ((property.name = 'Description' and property.value = 'XY') or (property.name = 'Responsible person' and property.value = 'Mr. Smith'))
Running my query i got two rows:
Main 1 with version 2
Main 2 with version 1
I would have expected to get only one row due to mismatch of 'responsible person' in 'Main 2'
EDIT 1:
So I found a solution which works but could be improved:
select distinct mainProfile
from MainProfile as mainProfile
left join mainProfile.subProfile as subProfile
left join subProfile.blocks as block
left join block.properties as property
where mainProfile.version = (select max(mainProfile2.version)from MainProfile mainProfile2 where mainProfile2.name = mainProfile.name)
and ((property.name = 'Description' and property.content = 'XY') or (property.name = 'Responsible person' and property.content = 'Mr. Smith'))
group by mainProfile.id
having count (distinct property) = 2
It actually retrieves the right 'Main Profiles'. But the problem is, that only the two found properties are getting fetched. I need all properties though because of further processing.

T-SQL - Turn table with current page and previous pages into a sequential order per session

I'm trying to create a table to show the activy per session on a website.
Should look like something like that
Prefered table:
+------------+---------+--------------+-----------+
| SessionID | PageSeq| Page | Duration |
+------------+---------+--------------+-----------+
| 1 | 1 | Home | 5 |
| 1 | 2 | Sales | 10 |
| 1 | 3 | Contact | 9 |
| 2 | 1 | Sales | 5 |
| 3 | 1 | Home | 30 |
| 3 | 2 | Sales | 5 |
+------------+---------+--------------+-----------+
Unfortunetly my current dataset doesn't have information about the session_id, but can be deducted based on the time and the path.
Current table:
+------------------+---------+------------+---------------+----------+
| DATE_HOUR_MINUTE | Page | Prev_page | Total_session | Duration |
+------------------+---------+------------+---------------+----------+
| 201801012020 | Home | (entrance) | 24 | 5 |
| 201801012020 | Sales | Home | 24 | 10 |
| 201801012020 | Contact | Sales | 24 | 9 |
| 201801012020 | Sales | (entrance) | 5 | 5 |
| 201801012020 | Home | (entrance) | 35 | 30 |
| 201801012020 | Sales | Home | 35 | 5 |
+------------------+---------+------------+---------------+----------+
What is the best way to turn the current table into the prefered table format?
I've tried searching for nested tables, looped tables, haven't found a something related to this problem yet.

So if you can risk sessions starting at the same time with the same duration, should be easy enough to do using a recursive query.
;WITH sessionTree AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as sessionId
, 1 AS PageSeq
, *
FROM Session
WHERE PrevPage = '(entrance)'
UNION ALL
SELECT prev.sessionId
, prev.PageSeq + 1
, next.*
FROM sessionTree prev
JOIN Session next
ON next.TotalDuration = prev.TotalDuration
AND next.PrevPage = prev.Page
AND next.date_hour_minute >= prev.date_hour_minute
)
SELECT * FROM sessionTree
ORDER BY sessionId, PageSeq
sessionId is generated for each entry with (entrance) as prevPage, with PageSeq = 1. Then in the recursive part visits with the timestamp later than the previous page and with the same duration are joined on prev.page = next.PrevPage condition.
Here's a working example on dbfiddle

How to perform COUNT with HABTM and left join?

Considering the following associations:
class Pool < ActiveRecord::Base
has_and_belongs_to_many :participations
end
class Participation < ActiveRecord::Base
has_and_belongs_to_many :pools
end
I want to get the number of participations in each pools (even if there is no participation).
This is what I am expecting (id is pool id):
+----+----------------------------+
| id | count('participations.id') |
+----+----------------------------+
| 1 | 1 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
+----+----------------------------+
This is what I get:
+----+----------------------------+
| id | count('participations.id') |
+----+----------------------------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
+----+----------------------------+
To obtain this result, I do a left join with a group by and a count:
Pool.joins('LEFT JOIN participations_pools ON
participations_pools.pool_id = pools.id LEFT JOIN participations ON
participations.id =
participations_pools.participation_id').select("pools.id,
count('participations.id')").group('pools.id')
I don't know how to get the good result and why I get that?
EDIT:
My answer at my question:
Pool.joins('LEFT JOIN participations_pools ON
participations_pools.pool_id = pools.id LEFT JOIN participations ON
participations.id =
participations_pools.participation_id').select("pools.id,
count(participations.id)").group('pools.id')
The quotes around count are the cause of my troubles

If you don't want to worry about that, write your query using only ActiveRecord methods:
Pool.joins('LEFT JOIN participations_pools ON participations_pools.pool_id = pools.id')
.joins('LEFT JOIN participations ON participations.id = participations_pools.participation_id')
.group('pools.id').count('participations.id')
The result will be a hash having pools.id as a key and count('participations.id') as a value for each row extracted from your database.
More info for count method: http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html#method-i-count
If count is used with group, it returns a Hash whose keys represent the aggregated column, and the values are the respective amounts:

Selecting several max() from a table

I will first say that the table structure is (unfortunately) set.
My goal is to select several max() from a query. Lets say I have the following tables
jobReferenceTable jobList
jobID | jobName | jobDepartment | listID | jobID |
_______|__________|_______________| _______|_________|
1 | dishes | cleaning | 1 | 1 |
2 |vacumming | cleaning | 2 | 5 |
3 | mopping | cleaning | 3 | 2 |
4 |countMoney| admin | 4 | 4 |
5 | hirePpl | admin | 5 | 1 |
6 | 2 |
7 | 3 |
8 | 3 |
9 | 1 |
10 | 5 |
Somehow, I would like to have a query that selects the jobID's from cleaning, and then shows the most recent jobList ID's for each job. I started a query below, and below that are what I'm hoping to get as results
query
SELECT jrt.jobName, jrt.jobDepartment
FROM jobReferenceTable
WHERE jobDepartment = 'cleaning'
JOIN jobList jl ON jr.jobID = jl.jobID
results
jobName | jobDepartment | listID |
________|_______________|________|
1 | cleaning | 9 |
2 | cleaning | 6 |
3 | cleaning | 8 |

Try this;
SELECT jrt.jobName, jrt.jobDepartment, MAX(jl.listID)
FROM jobReferenceTable AS jrt INNER JOIN jobList AS jl ON jrt.jobID = jl.jobID
WHERE jrt.jobDepartment = 'cleaning'
GROUP BY jrt.jobName, jrt.jobDepartment
So far as I can see, you need only the one MAX() - the listID.
MAX() is an aggregate function, meaning that the rest of your result set must then be 'grouped'.

SQL AS/400 - Extract product price per store

We have a phone dialer who call us store to inform them about gas price in their region.
We have 3 tables (WBDAPP00,WBDCIE00,WBDCIA00)
WBDAPP00 is where we store information about the call.
DANOID = ID
DA#INT,DA#IND,DA#TEL = phone number
DA#ENV = The number of group call, we send 1 message to few store.
DASTAT = The status of the call (Confirm by store,canceled,running, confirmed by us, in pause)
DADTHR = The timestamp of the last status modification
WBDCIE00 is where we store information about the group of store
CIE#EN = ID
CIEDHC = The timestamp where the call is effective, we can call the morning to tell the price will change at 14h30
CIE$OR = The price for regular
CIE$PL = The price for plus
CIE$SP = The price for super
CIE$DI = The price for diesel
WBDCIA00 is complementary information about WBDAPP00
CIA#ST = The ID of the store
CIA#AP = The ID of the call
CIE#EN = The ID of the group call
CIABAN = This is the number of the compagny of the store
This is a sample output of these 3 tables
SELECT * FROM PRDCM/WBDAPP00 WHERE DA#ENV = 17258 OR DA#ENV = 17257
+--------+--------+--------+---------+--------+--------+----------------------------+-----------+--------+
| DANOID | DA#INT | DA#IND | DA#TEL | DA#ENV | DASTAT | DADTHR | DAPARM | DAMUSR |
+--------+--------+--------+---------+--------+--------+----------------------------+-----------+--------+
| 100420 | 1 | 418 | 9600055 | 17257 | 4 | 2012-05-07-09.15.04.768228 |1;2;1;1;1;1| ISALAP |
| 100421 | 1 | 819 | 7346491 | 17258 | 0 | 2012-05-07-09.23.32.362971 |0;4;0;1;0;0| ISALAP |
| 100422 | 1 | 819 | 7624747 | 17258 | 1 | 2012-05-07-09.24.28.042330 |0;3;1;1;0;1| ISALAP |
| 100423 | 1 | 819 | 6377874 | 17258 | 0 | 2012-05-07-09.23.32.803073 |0;3;0;1;0;1| ISALAP |
| 100424 | 1 | 819 | 8742844 | 17258 | 1 | 2012-05-07-09.24.25.347116 |1;1;1;1;0;1| ISALAP |
| 100425 | 1 | 819 | 8255744 | 17258 | 0 | 2012-05-07-09.23.33.207688 |1;3;1;1;0;1| ISALAP |
+--------+--------+--------+---------+--------+--------+----------------------------+-----------+--------+
SELECT * FROM PRDCM/WBDCIE00 WHERE CIE#EN = 17258 OR CIE#EN = 17257
+--------+----------------------------+--------+--------+--------+--------+
| CIE#EN | CIEDHC | CIE$OR | CIE$PL | CIE$SP | CIE$DI |
+--------+----------------------------+--------+--------+--------+--------+
| 17257 | 2012-05-04-17.00.00.000000 | 0 | 0 | 0 | 1,359 |
| 17258 | 2012-05-07-09.30.00.000000 | 1,354 | 0 | 0 | 0 |
+--------+----------------------------+--------+--------+--------+--------+
SELECT * FROM PRDCM/WBDCIA00 WHERE CIA#EN = 17258 OR CIA#EN = 17257
+--------+--------+--------+--------+
| CIA#ST | CIA#AP | CIA#EN | CIABAN |
+--------+--------+--------+--------+
| 96 | 100420 | 17257 | 2 |
| 316 | 100421 | 17258 | 4 |
| 320 | 100422 | 17258 | 3 |
| 321 | 100423 | 17258 | 3 |
| 338 | 100424 | 17258 | 1 |
| 366 | 100425 | 17258 | 3 |
+--------+--------+--------+--------+
This is the relation between tables
CIA#AP = DANOID
CIA#EN = CIE#EN = DA#ENV
I want to extract the last CIE$OR (not 0) and the last CIE$DI (not 0) for each CIA#ST.
The last one is determined by CIEDHC (Desc order).
DASTAT needs to be 1 or 4.
This is an example of want I want to extract from the data above :
+--------+--------+--------+
| CIA#ST | CIE$OR | CIE$DI |
+--------+--------+--------+
| 96 | 0 | 1,359 |
| 316 | 1,354 | 0 |
| 320 | 1,354 | 0 |
| 321 | 1,354 | 0 |
| 338 | 1,354 | 0 |
| 366 | 1,354 | 0 |
+--------+--------+--------+
Or like this one, that's not ideal but I will tolerate it in this case
+--------+-------------+-------+
| CIA#ST | productType | price |
+--------+-------------+-------+
| 96 | 3 | 1,359 |
| 316 | 6 | 1,354 |
| 320 | 6 | 1,354 |
| 321 | 6 | 1,354 |
| 338 | 6 | 1,354 |
| 366 | 6 | 1,354 |
+--------+-------------+-------+
For those who don't know AS400, FETCH FIRST 1 ROWS ONLY is equal to TOP 1 AND LIMIT 1
LAST does not exist in AS400 so I need to replace
SELECT LAST(Column1) AS test FROM table1
by
SELECT Column1,Column2 FROM table1 ORDER BY Column2 DESC LIMIT 1
I have tried with subselect but you can't use ORDER BY and FETCH FIRST 1 ROWS ONLY.
We are in V5R1 without any PTF.
This is an exemple of extraction
SELECT CIA#ST,CIE$OR,CIE$DI,CIEDHC
FROM PRDCM/WBDAPP03
INNER JOIN PRDCM/WBDCIE01 ON CIE#EN = DA#ENV
INNER JOIN PRDCM/WBDCIA01 ON CIA#AP = DANOID
WHERE DASTAT IN (1,4)
ORDER BY CIEDHC,DA#ENV
FETCH FIRST 5 ROWS ONLY
+--------+--------+--------+----------------------------+
| CIA#ST | CIE$OR | CIE$DI | CIEDHC |
+--------+--------+--------+----------------------------+
| 88 | 1,014 | 1,039 | 2010-08-25-09.00.00.000000 |
| 89 | 1,014 | 1,039 | 2010-08-25-09.00.00.000000 |
| 90 | 1,014 | 1,039 | 2010-08-25-09.00.00.000000 |
| 91 | 1,014 | 1,039 | 2010-08-25-09.00.00.000000 |
| 119 | 1,084 | 0 | 2010-08-25-09.00.00.000000 |
| 522 | 1,014 | 1,039 | 2010-08-25-09.00.00.000000 |
+--------+--------+--------+----------------------------+
I'll try all your suggestions.

Frankly, I'm a little twitchy about your schema here - there's some denormalization I'm not happy with, among other things (a multi-value column, really?). But you probably have a limited ability to change it, so... If possible, you should consider upgrading to at least V6R1 (which is what we're on), as the database gets more goodies. Thankfully, you still have CTEs, which will help a bit.
I'm assuming that what you want is the latest price change for a store (given by CIEDHC) with a call for that store in DASTAT as 1 or 4, not given by the call-time (so, what happens if an earlier group-call is 'confirmed' after a later one?). In other words, this isn't the last 'confirmed' change, it's the last 'entered' change.
I'm also assuming you have a 'store' table, with all the actual store ids defined. However, since you didn't list it, I created a CTE to manufacture one. You can (and probably should) swap it out in the resulting statement.
WITH Store (storeId) as (
SELECT DISTINCT cia#st
FROM Wbdcia00),
Price_Change (callGroup, occurredAt, productType, newPrice) as (
SELECT cie#en, ciedhc, 1, cie$or
FROM Wbdcie00
WHERE cie$or > 0
UNION ALL
SELECT cie#en, ciedhc, 4, cie$di
FROM Wbdcie00
WHERE cie$di > 0),
Confirmed_Changes (storeId, occurredAt, productType, newPrice) as (
SELECT WarehouseCall.cia#st, Change.occurredAt,
Change.productType, Change.newPrice
FROM Wbdcia00 as WarehouseCall
JOIN Wbdapp00 as Call
ON Call.danoid = WarehouseCall.cia#ap
AND Call.dastat IN (1, 4)
JOIN Price_Change as Change
ON Change.callGroup = da#env),
Latest_Change (storeId, productType, newPrice) as (
SELECT Actual.storeId, Actual.productType, Actual.newPrice
FROM Confirmed_Changes as Actual
EXCEPTION JOIN Confirmed_Changes as Remove
ON Remove.storeId = Actual.storeId
AND Remove.productType = Actual.productType
AND Remove.occurredAt > Actual.occurredAt)
SELECT store.storeId, COALESCE(Regular.newPrice, 0) as regularPrice,
COALESCE(Diesel.newPrice, 0) as dieselPrice
FROM Store
LEFT JOIN Latest_Change as Regular
ON Regular.storeId = Store.storeId
AND Regular.productType = 1
LEFT JOIN Latest_Change as Diesel
ON Diesel.storeId = Store.storeId
AND Diesel.productType = 4
Some things to note -
I figured you weren't actually giving a product a price of 0. This means that you're not looking for the individual call that went out, with both prices listed - you're going for the last change that happened, for each product. Which is why I pivoted/unpivoted that table like I did.
Needless to say, this statement reports the last entered change that was 'confirmed'. This is not the last confirmation of a change (indicated by dadthr), however.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Average of a column with joins - sql

Related

Select all rows where rows in another joined table match condition

T-SQL - Turn table with current page and previous pages into a sequential order per session

How to perform COUNT with HABTM and left join?

Selecting several max() from a table

SQL AS/400 - Extract product price per store

Categories

Resources