Best way to count this Data - sql

In short I have 2 tables:
USERS:
------------------------
UserID | Name
------------------------
0 a
1 b
2 c
CALLS:
------------------------
ToUser | Result
------------------------
0 ANSWERED
1 ENGAGED
1 ANSWERED
0 ANSWERED
Etc, etc (i use a numerical referance for result in reality)
I have over 2 million records each detailing a call to a specific client. Currently I'm using Case statements to count each recurance of a particular result AFTER I have already done the quick total count:
COUNT(DISTINCT l_call_log.line_id),
COALESCE (SUM(CASE WHEN l_call_log.line_result = 1 THEN 1 ELSE NULL END), 0) AS [Answered],
COALESCE (SUM(CASE WHEN l_call_log.line_result = 2 THEN 1 ELSE NULL END), 0) AS [Engaged],
COALESCE (SUM(CASE WHEN l_call_log.line_result = 4 THEN 1 ELSE NULL END), 0) AS [Unanswered]
Am I doing 3 scans of the data after my inital total count? if so, is there a way I can do one sweep and count the calls as-per-result in one go?
Thanks.

This would take one full table scan.
EDIT: There's not enough information to answer; because the duplicate removal (DISTINCT) that I missed earlier, we can't tell what strategy that would be used.... especially without knowing the database engine.
In just about every major query engine, each aggregate function is executed per each column per each row, and it may use a cached result (such as COUNT(*) for example).
Is line_result indexed? If so, you could leverage a better query (GROUP BY + COUNT(*) to take advantage of index statistics, though I'm not sure if that's worthwhile depending on your other tables in the query.

There is the GROUP BY construction in SQL. Try:
SELECT COUNT(DISTINCT l_call_log.line_id)
GROUP BY l_call_log.line_result

I would guess it's a table scan, since you don't have any depending subqueries. Run explain on the query to be sure.

Related

Max match same numbers from each row

To generate 1mln rows of report with the below mentioned script is taking almost 2 days so, really appreciate if somebody could help me with different script which the report can be generated within 10-15mins please.
The requirement of the report is as following;
Table “cover” contains 5mln rows & 6 columns of data and likewise table “data” contains 500,000 rows and 6 columns.
So, each numbers of the rows in table cover has to go through table date and provide the maximum matches.
For instance, as mentioned on the below tables, there could be 3 matches in row #1, 2 matches in row #2 and 5 matches in row #3 so the script has to select the max selection which is 5 in row #3.
Sample table
UPDATE public.cover_sheet AS fc
SET maxmatch = (SELECT MAX(tmp.mtch)
FROM (
SELECT (SELECT CASE WHEN fc.a=drwo.a THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.b=drwo.b THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.c=drwo.c THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.d=drwo.d THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.e=drwo.e THEN 1 ELSE 0 END) +
(SELECT CASE WHEN fc.f=drwo.f THEN 1 ELSE 0 END) AS mtch
FROM public.data AS drwo
) AS tmp)
WHERE fc.code>0;
SELECT *
FROM public.cover_sheet AS fc
WHERE fc.maxmatch>0;
As #a_horse_with_no_name mentioned in the comment to the question, your question is not clear...
Seems, you want to get the number of records which 6 fields from both tables are equal.
I'd suggest to:
reduce the number of select statements, then the speed of query execution will increase,
split your query into few smaller ones (good practice), to check your logic,
use join to get equal data, see: Visual Representation of SQL Joins
use subquery or cte to get result on which you'll be able to update table.
I think you want to get result as follow:
SELECT COUNT(*) mtch
FROM public.cover_sheet AS fc INNER JOIN public.data AS drwo ON
fc.a=drwo.a AND fc.b=drwo.b AND fc.c=drwo.c AND fc.d=drwo.d AND fc.e=drwo.e AND fc.f=drwo.f
If i'm not wrong and above query is correct, the time of execution of above query will reduce to about 1-2 minutes.
Finally, update query may look like:
WITH qry AS
(
-- proper select statement here
)
UPDATE public.cover_sheet AS fc
SET maxmatch = qry.<fieldname>
FROM qry
WHERE fc.code>0 AND fc.<key> = qry.<key>;
Note:
I do not see your data and i know nothing about its structure, relationships, etc. So, you have to change above query to your needs.

sql sum up connection time in call detail records table

lets say Ive got Customers Detail Records table which has columns:
UserAId
UserBId
Duration
Impulses
For eample:
UserAId UserBId Duration Impulses
1 2 30 5
1 2 20 3
2 1 10 2
2 3 5 1
Ok, now I would like to write a query which would aggregate total Duration, Impulses and count of calls betweend users without direction so that the result would look like:
UserAId UserBId TotalDuration TotalImpulses TotalCallsCount
1 2 60 10 3
2 3 5 1 1
Is it possible ? If so then how to do this > thanks for help
Of course, if you execute a query like this:
SELECT
UserAId,
UserBId,
SUM(Duration) AS TotalDuration,
SUM(Impulses) AS TotalImpulses,
COUNT(*) AS TotalCallsCount
FROM CustomerDetail
GROUP BY UserAId, UserBId
... you will not get what you want. That is because this query does not aggregate and combine the rows that have UserAId=1 and UserBId=2 with those that have UserAId=2 and UserBId=1.
To do what you want you need a little trick. What you call UserAId and UserBId in the result set are not actually always what you read on the input table. This query will do what you ask:
SELECT
CASE WHEN UserAId<UserBId THEN UserAId ELSE UserBId END AS User_AId,
CASE WHEN UserAId<UserBId THEN UserBId ELSE UserAId END AS User_BId,
SUM(Duration) AS TotalDuration,
SUM(Impulses) AS TotalImpulses,
COUNT(*) AS TotalCallsCount
FROM CustomerDetail
GROUP BY
CASE WHEN UserAId<UserBId THEN UserAId ELSE UserBId END,
CASE WHEN UserAId<UserBId THEN UserBId ELSE UserAId END
... it works even if UserAId=UserBId (you did not state if those two values can or cannot be the same). You will always get as User_AId the lesser of the 2 Ids, and as User_BId the greater of the 2 Ids... even if that combination does not exist as UserAId, UserBId nowhere in the table (obviously only if it does exist as UserBId, UserAId).
I have tested this on SQLFiddle here.
I am no SQL-Server expert. Some engines do allow the GROUP BY clause to reference calculated columns defined in the SELECT expression list without having to redefine them explicitly. This is non standard SQL, but it does make the SQL much more readable. Not sure if SQL-Server supports some sort of syntax for this.

select sum(a), sum(b where c=1) from db; sql conditions in select statement

i guess i just lack the keywords to search, but this is burning on my mind:
how can i add a condition to the sum-function in the select-statement like
select sum(a), sum(b where c=1) from db;?
this means, i want to see the sum of column a and the sum of column b, but only of the records in column b of which column c has the value 1.
the output of heidi just says "bad syntac near WHERE". may there be any other way?
thanks in advance and best regards from Berlin, joachim
The exact syntax may differ depending on the database engine, however it will be along the lines of
SELECT
sum(a),
sum(CASE WHEN c = 1 THEN b ELSE 0 END)
FROM
db
select sum(case when c=1 then b else 0 end)
This technique is useful when you need a lot of aggregates on the same set of data - you can query the entire table without applying a where filter, and have a bunch of these which give you aggregated data for a specific filter.
It's also useful when you need a lot of counts based on filters - you can do sums of 1 or 0:
select sum(case when {somecondition} then 1 else 0 end)

T-SQL SUM All with a Conditional COUNT

I have a query that produces the following:
Team | Member | Cancelled | Rate
-----------------------------------
1 John FALSE 150
1 Bill TRUE 10
2 Sarah FALSE 145
2 James FALSE 110
2 Ashley TRUE 0
What I need is to select the count of members for a team where cancelled is false and the sum of the rate regardless of cancelled status...something like this:
SELECT
Team,
COUNT(Member), --WHERE Cancelled = FALSE
SUM(Rate) --All Rows
FROM
[QUERY]
GROUP BY
Team
So the result would look like this:
Team | CountOfMember | SumOfRate
----------------------------------
1 1 160
2 2 255
This is just an example. The real query has multiple complex joins. I know I could do one query for the sum of the rate and then another for the count and then join the results of those two together, but is there a simpler way that would be less taxing and not cause me to copy and paste an already complex query?
You want a conditional sum, something like this:
sum(case when cancelled = 'false' then 1 else 0 end)
The reason for using sum(). The sum() is processing the records and adding a value, either 0 or 1 for every record. The value depends on the valued of cancelled. When it is false, then the sum() increments by 1 -- counting the number of such values.
You can do something similar with count(), like this:
count(case when cancelled = 'false' then cancelled end)
The trick here is that count() counts the number of non-NULL values. The then clause can be anything that is not NULL -- cancelled, the constant 1, or some other field. Without an else, any other value is turned into NULL and not counted.
I have always preferred the sum() version over the count() version, because I think it is more explicit. In other dialects of SQL, you can sometimes shorten it to:
sum(cancelled = 'false')
which, once you get used to it, makes a lot of sense.

Complex SQL query on one table

Have forgotten SQL queries as have not used it for a long time.
I have a following requirement.
Have a table called match where I keep my competitor details with respect to matches my team have played against them. So some important fields are like this
match_id
competior_id
match_winner_id
ismatchtied
goals_scored_my_team
goals_scored_comp
From this table I want to get the head to head information for all my competitors.
like this
Competitor Matches Wins Losses Draws
A 10 5 4 1
B 8 3 2 1
Draw information I can get from ismatchtied is set to 'Y' or 'N'.
I want to get all the info from one query. I can get all the info from executing queries separately and do complex logic processing in my server code. But my performance will take a hit.
Any help will be hugely appreciated.
cheers,
Saurav
You could use conditional aggregation, involving CASE expressions inside aggregate functions, like this:
SELECT
competitor_id,
COUNT(*) AS Matches,
COUNT(CASE WHEN goals_scored_my_team > goals_scored_comp THEN 1 END) AS Wins,
COUNT(CASE WHEN goals_scored_my_team < goals_scored_comp THEN 1 END) AS Losses,
COUNT(CASE WHEN goals_scored_my_team = goals_scored_comp THEN 1 END) AS Draws
FROM matches
GROUP BY
competitor_id
;
Every CASE above will evaluate to NULL when the condition isn't satisfied. And since COUNT(expr) omits NULLs, every COUNT(CASE ...) in the above query will effectively only count rows that match the corresponding WHEN condition.
So, the first COUNT counts only rows where my team scored more against the competitor, i.e. where my team won. In a similar way, the second and the third CASEs get the numbers of losses and draws.
SELECT m4.competior_id, COUNT(*) as TotalMathces,
(select count(*) from match m1 where goals_scored_my_team>goals_scored_comp AND m1.competior_id=m4.competior_id) as WINS,
(select count(*) as WIN from match m2 where goals_scored_comp>goals_scored_my_team AND m2.competior_id=m4.competior_id) as LOSES,
(select count(*) as WIN from match m3 where goals_scored_my_team=goals_scored_comp AND m3.competior_id=m4.competior_id) as DRAWS
FROM match m4 group by m4.competior_id;