Adding another column based on different criteria (SQL-server)

Adding another column based on different criteria (SQL-server) - sql

I do quite a bit of data analysis and use SQL on a daily basis but my queries are rather simple, usually pulling a lot of data which I thereafter manipulate in excel, where I'm a lot more experienced.
This time though I'm trying to generate some Live Charts which have as input a single SQL query. I will now have to create complex tables without the aid of the excel tools I'm so familiar with.
The problem is the following:
We have telesales agents that book appointments by answering to inbound calls and making outbound cals. These will generate leads that might potentially result in a sale. The relevant tables and fields for this problem are these:
Contact Table
Agent
Sales Table
Price
OutboundCallDate
I want to know for each telesales agent their respective Total Sales amount in one column, and their outbound sales value in another.
The end result should look something like this:
+-------+------------+---------------+
| Agent | TotalSales | OutboundSales |
+-------+------------+---------------+
| Tom | 30145 | 0 |
| Sally | 16449 | 1000 |
| John | 10500 | 300 |
| Joe | 50710 | 0 |
+-------+------------+---------------+
With the below SQL I get the following result:
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id
GROUP BY contact.agent
+-------+------------+
| Agent | TotalSales |
+-------+------------+
| Tom | 30145 |
| Sally | 16449 |
| John | 10500 |
| Joe | 50710 |
+-------+------------+
I want to add the third column to this query result, in which the price is summed only for records where the OutboundCallDate field contains data. Something a bit like (where sales.OutboundCallDate is Not Null)
I hope this is clear enough. Let me know if that's not the case.

Use CASE
SELECT c.Agent,
SUM(s.price) AS TotalSales,
SUM(CASE
WHEN s.OutboundCallDate IS NOT NULL THEN s.price
ELSE 0
END) AS OutboundSales
FROM contact c, sales s
WHERE c.id = s.id
GROUP BY c.agent

I think the code would look
SELECT contact.agent, SUM(sales.price)
FROM contact, sales
WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate)
GROUP BY contact.agent

notI'm assuming your Sales table contains something like Units and Price. If it's just a sales amount, then replace the calculation with the sales amount field name.
The key thing here is that the value summed should only be the sales amount if the OutboundCallDate exists. If the OutboundCallDate is not NULL, then we're using a value of 0 for that row.
select Agent.Agent, TotalSales = sum (sales.Price*Units)
, OutboundSales = sum (
case when Outboundcalldate is not null then price*Units
else 0
end)
From Sales inner join Agent on Sales.Agent = Agent.Agent
Group by Agent.Agent

Related

SQL - Identify if a user is present every month

I am performing some data analysis on users who have made transactions over the course of three months.
What I would like to do is identify customers who made specific transaction types (Credit) in every single month present in the data table over those two years. As you can see in the data table below, User A has performed a Credit transaction in months 1,2,3 and I would like a flag saying "Frequent" applied to the customer.
User B, however, has not performed a credit transaction every month (month 2 was Debit), and so I would like them to have a different flag name (e.g. "Infrequent").
How can I use SQL to identify if a user has made a specific transaction type each month?
| Date | User | Amount | Transaction Type | **Flag ** |
| 2022-01-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-03-15 | A | $15.00 | Credit | **Flag ** |
...
...
| 2022-01-15 | B | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | B | $15.00 | Debit | **Flag ** |
...
| 2022-03-15 | B | $15.00 | Credit | **Flag ** |
I have tried the following - hoping there is a better or more simple way.
SELECT
Date, User, Amount, Transaction_Type,
CASE WHEN Count(present) = 3 THEN 'Frequent' ELSE 'Infrequent'
FROM Transactions
LEFT JOIN (
SELECT
User,Month(Date),Count(Transaction_Type) as present
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User,Month(Date)
Having
Count(Transaction_Type) > 0
) subquery
ON subquery.User = Transaction.User
GROUP BY
Date,User,Amount,Transaction_Type

That is the way I would approach it. Assuming you are using T-SQL I would make the following changes. Instead of having the LEFT JOIN be to a sub-query, I would make the sub-query a CTE and then joint to that. I find it easier to grok when the main query is not full of sub-queries and you can test the CTE on its own more easily, plus if performance becomes an issue is relatively trivial to convert the CTE to a temp table. without affecting the main query too much.
You have a couple of problems I think. the first is that your subquery is going to return you the count of the credits in each month. If I make 3 credits in January this is going to flag me as frequent because the total is more than 3. You probably want to do a
COUNT(DISTINCT Transaction_type) AS hasCredit
to identify if there is AT LEAST ONE credit transaction, then have another aggregation that
SUM(hasCredit)
to get the number of months in which a credit appears.
using nested sub-queries means your LEFT JOIN would now be two sub-queries deep and dissapearing off the right hand side of your screen. Writing them as CTEs keeps the main logic clean and script narrow.
I think this does what you need, but can't test it because I don't have any sample data.
WITH CTE_HasCredit AS
(
SELECT
User
,Month(Date) AS [TransactionMonth]
,Count(DISTINCT Transaction_Type) AS [hasCredit]
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User
,Month(Date)
Having
Count(Transaction_Type) > 0
)
,
CTE_isFrequent AS
(
SELECT
User
,SUM(hasCredit) AS [TotalCredits]
FROM
CTE_HasCredit
GROUP BY
User
)
SELECT
TXN.Date
, TXN.User
, TXN.Amount
, TXN.Transaction_Type
,CASE
WHEN FRQ.TotalCredits >= 3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType]
FROM
Transactions AS TXN
LEFT JOIN
CTE_isFrequent AS FRQ ON FRQ.User = TXN.User
GROUP BY
TXN.Date
,TXN.User
,TXN.Amount
,TXN.Transaction_Type
I don't think you need the GROUP BY on the main query either; it would de-dupe transactions for the same day for the same amount.
You might also want to look at the syntax for COUNT() OVER(). These would allow you to do the calculations in the main query and would look something like.
,CASE
WHEN COUNT(DISTINCT TXN.Transaction_Type) OVER(PARTITION BY User, MONTH(TXN.Date),TXN.Transaction_Type) >=3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType2]
This second way would give you customer type for both the Debits and Credits. I am not aware of a way to filter the COUNT() OVER() to just Credits, for that you would need to use the CTE method.

SQL Query to return a distinct count of one column while allowing a full summation of a second column, grouped by a third

I'm writing a query in access 2010 and i can't use count(distinct... so I'm running into a bit of trouble with what can be found below:
An example of my table is as follows
Provider | Member ID | Dollars | Status
FacilityA | 1001 | 50 | Pended
FacilityA | 1001 | 100 | Paid
FacilityA | 1002 | 200 | Paid
FacilityB | 1005 | 30 | Pended
FacilityB | 1009 | 90 | Pended
FacilityC | 1001 | 100 | Paid
FacilityC | 1008 | 500 | Paid
I want to return the total # of unique members that have visited each facility, but I also want to get the total dollar amount that is Pended, so for this example the ideal output would be
Provider | # members | Total Pended charges
FacilityA | 2 | 50
FacilityB | 2 | 120
FacilityC | 2 | 0
I tried using some code I found here: Count Distinct in a Group By aggregate function in Access 2007 SQL
and here:
SQL: Count distinct values from one column based on multiple criteria in other columns
Copying the code from the first link provided by gzaxx:
SELECT cd.DiagCode, Count(cd.CustomerID)
FROM (select distinct DiagCode, CustomerID from CustomerTable) as cd
Group By cd.DiagCode;
I can make this work for counting the members:
SELECT cd.Provider_Number, Count(cd.Member_ID)
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd
ON claims_table.Provider_Number=cd.Provider_Number
Group By cd.Provider_Number;
However, no matter what I try I can't get a second portion dealing with the dollars to work without causing an error or messing up the calculation on the member count.

SELECT cd.Provider_Number,
-- claims_table.Member_ID, claims_table.Dollars
SUM(IIF ( Claims_Table.Status = 'Pended' , Claims_Table.Dollars , 0 )) as Dollars_Pending,
Count(cd.Member_ID) as Uniq_Members,
Sum(Dollars) as Dollar_Wrong
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd inner join #claims_table
ON claims_table.Provider_Number=cd.Provider_Number and claims_table.Member_ID = cd.Member_ID
Group By cd.Provider_Number;

This should work fine based only on the table you described (named Tabelle1):
SELECT Provider, count(MemberID) as [# Members],
NZ(SUM(SWITCH([Status]='Pended', Dollars)),0) as [Total pending charges]
FROM Tabelle1
GROUP BY Provider;
Explanation
I think the first and second column are self-explanatory.
The third column is where most things are done. The SWITCH([Status]='Pended', Dollars) returns the Dollars only if the status is pending. This then gets summed up by SUM. The NZ(..,0) will set the column to 0 if the SUM returns a NULL.
EDIT: This was tested on Access 2016

updating nulls based on column

So I got this very inconsistent record for example(just an example):
Manager | Associate | FTE | Revenue
Bob | James | Y | 500
Bob | James | NULL | 100
Bob | James | Y | 200
Kelly | Rick | N | 200
Kelly | Rick | N | 500
Kelly | Rick | NULL | 300
So the goal i wanted was to Sum up the revenue, but the problem is in the group by the nulls kinda split them apart. So i want to write an update statement saying basically "well Looks like James and Bob are both FTE, so lets update that to Y and Kelly and rick are not so update that to no."
How can i fix this? Using MSAccess and of course my table is a lot biger with a lot of different name combos.

You can "impute" the value by using an aggregation function. The following query aggregates by manager/associate and takes the maximum value of fte. This is then joined back to the original data to do the calculation:
select ma.fte, sum(Revenue)
from table as t inner join
(select manager, associate, max(fte) as fte
from table as t
group by manager, associate
) as ma
on t.manager = ma.manager and
t.associate = ma.associate
group by ma.fte;
EDIT:
Immediately after posting this, I realized the join is not necessary. Two aggregations are sufficient:
select ma.fte, sum(Revenue)
from (select manager, associate, max(fte) as fte, sum(Revenue) as Revenue
from table as t
group by manager, associate
) as ma
group by ma.fte;

You haven't given the primary key columns, which makes it a bit harder. I've called it {id} below.
With the nulls, many SQL dialects have an "IfNull" function, but it seems MS-Access does not. You can get the same effect this way:
IIF(ISNULL(column),0,column)
You'd use that in a SELECT as so:
SELECT IIF(ISNULL(Revenue),0,Revenue) FROM ...
For a one-off fix you could do this:
UPDATE {table} SET Revenue=0 WHERE Revenue = NULL;
Doing a join to get the FTE from another row is more complex, and I don't have access handy to see just what the limits and syntax are. The easy to understand way is a nested query:
UPDATE {table} a SET FTE = (SELECT max(FTE) FROM {table} b WHERE FTE IS NOT NULL AND a.{id} = b.{id})
The max() function works here because it ignores nulls, where some other functions return null if you pass a null in.

Fetch Id's that are related to a specific set of items, but not others

Good morning all, apologies for the title... i had trouble simplifying the problem down to a line. My database platform is Teradata.
I am working w/ a table like the following (let's call it "t1")
+------------+----------------------------------------+
| Service_Id | Product |
+------------+----------------------------------------+
| 1 | Traffic |
| 1 | Weather |
| 1 | Travel |
| 1 | Audio |
| 1 | Audio Add-on |
| 2 | Traffic |
| 2 | Weather |
| 2 | Travel |
+------------+----------------------------------------+
I am trying to select service_id's that are related to the following products AND ONLY the following products: Traffic, Weather, Travel
"Service_Id = 1" does not apply here because while it has the required products, it also has an "audio" product related to it... so we have to leave it out. I was able to successfully do this through a series of temp (volatile) tables but it's feeling really hacky and I feel there's got to be a better way. Thanks for your assistance.

I'm doing stuff like that (find a subset/superset/exact match for a set of rows) in my training classes using pizzas :-)
There are several ways to get your result, but for an exact match the easiest way is a SUM using following logic:
SELECT service_id
FROM t1
GROUP BY 1
HAVING
SUM(CASE WHEN Product IN ('Traffic', 'Weather', 'Travel') THEN 1 ELSE -1 END = 3

Assuming that Product is unique for every service_ID.
SELECT service_ID
FROM tableName a
WHERE Product IN ('Traffic', 'Weather', 'Travel') AND
EXISTS
(
SELECT 1
FROM tableName b
WHERE a.Service_ID = b.Service_ID
GROUP BY b.Service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
)
GROUP BY service_ID
HAVING COUNT(*) = 3 -- <<== total number of products
SQLFiddle Demo (demo is running under MySQL database, not sure if it will work on teradata)

Best way to join the two tables including duplicates from one table

Accounts (table)
+----+----------+----------+-------+
| id | account# | supplier | RepID |
+----+----------+----------+-------+
| 1 | 123xyz | Boston | 2 |
| 2 | 245xyz | Chicago | 2 |
| 3 | 425xyz | Chicago | 3 |
+----+----------+----------+-------+
PayOut (table)
+----+----------+----------+-------------+--------+
| id | account# | supplier | datecreated | Amount |
+----+----------+----------+-------------+--------+
| 5 | 245xyz | Chicago | 01-15-2009 | 25 |
| 6 | 123xyz | Boston | 10-15-2011 | 50 |
| 7 | 123xyz | Boston | 10-15-2011 | -50 |
| 8 | 123xyz | Boston | 10-15-2011 | 50 |
| 9 | 425xyz | Chicago | 10-15-2011 | 100 |
+----+----------+----------+-------------+--------+
I have accounts table and I have payout table. Payout table comes from abroad so we do not have any control over it. This leaves us with a problem that we can't join the two tables based on record ID field, that is one problem which we can't solved. We therefore join based on Account#, SupplierID (2nd and 3rd column). This creates a problem that it creates (possibly) many to many relationship. But we filter our records if they are active and we use a second filter on payout table when the payout was created. Payout are created months to month. There are two problems with this in my view
The query takes quite a bit of time to complete (could be inefficient)
There are certain duplicates that are removed which should not be removed. Example is record 6 and 8 in payout table. What happened here is, we got a customer, then the customer cancelled then he got him back. In this case +50, -50 and +50. Again all values are valid and must show in the report for audit purposes. Currently only one +50 is shown, the other is lost. There are a couple of other problems within the report that comes once in a while.
Here is the query. It uses groups by to remove duplicates. I would like to have an advance query which outperforms and which does takes into account that no record in PayOut table is duplicated as long as they come up in the month of the report.
Here is our current query
/* Supplied to Store Procedure */
-----------------------------------
#RepID // the person for whome payout is calculated
#Month // of payment date
#year // year of payment date
-----------------------------------
select distinct
A.col1,
A.col2,
...
A.col10,
B.col2,
B.Col2,
B.Amount /* this is the important column, portion of which goes to Rep */
from records A
JOIN payout B
on A.Supplier = B.Supplier AND A.Account# = B.Account#
where datepart(mm, B.datecreated) = #Month /* parameter to stored procedure */
and datepart(yyyy, B.datecreated) = #Year
and A.[rep ID] = #RepID /* parameter to SP */
group by
col1,col2,col3,....col10
order by customerName
Is this query optimum? Can I improve it using CROSS APPLY or WHERE EXISTs that will make it faster as well as remove the duplicate problem?
Note that this query is used to get payout of a rep. Hence every record has repid field who it is assigned to. Ideally I would like to use Select WHERE Exist query.

It's difficult to understand exactly what you want because in one place you say you 'want' the duplicates but then you say that you are using the group by to remove duplicates. So the first thought would be "Why not just get rid of the group by?". But I have to believe you are smart enough to have thought of that yourself, so I assume it's got to be there for a reason.
I think someone here could help you pretty easily if you could post the actual query, but since you say you can't I will just try to give you some direction in solving the problem...
Instead of trying to do everything in one statement, use temporary tables or views to split it up. It may be easier for you to think about how to get rid of the duplicates you don't want and keep the ones you do first and put those into a temporary table, and then join the tables together and work with that.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Adding another column based on different criteria (SQL-server) - sql

Use CASE SELECT c.Agent, SUM(s.price) AS TotalSales, SUM(CASE WHEN s.OutboundCallDate IS NOT NULL THEN s.price ELSE 0 END) AS OutboundSales FROM contact c, sales s WHERE c.id = s.id GROUP BY c.agent

I think the code would look SELECT contact.agent, SUM(sales.price) FROM contact, sales WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate) GROUP BY contact.agent

Related

SQL - Identify if a user is present every month

SQL Query to return a distinct count of one column while allowing a full summation of a second column, grouped by a third

updating nulls based on column

Fetch Id's that are related to a specific set of items, but not others

Best way to join the two tables including duplicates from one table

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Adding another column based on different criteria (SQL-server) - sql

Use CASE SELECT c.Agent, SUM(s.price) AS TotalSales, SUM(CASE WHEN s.OutboundCallDate IS NOT NULL THEN s.price ELSE 0 END) AS OutboundSales FROM contact c, sales s WHERE c.id = s.id GROUP BY c.agent

I think the code would look SELECT contact.agent, SUM(sales.price) FROM contact, sales WHERE contact.id = sales.id AND SUM(WHERE sales.OutboundCallDate) GROUP BY contact.agent

Related

SQL - Identify if a user is present every month

SQL Query to return a distinct count of one column while allowing a full summation of a second column, grouped by a third

updating nulls based on column

Fetch Id's that are related to a specific set of items, but not others

Best way to join the two tables *including* duplicates from one table

Categories

Resources

Best way to join the two tables including duplicates from one table