How do I use array_agg with a condition? - sql

I have a table with a list of potential customers, their activity, and their sales representative. Every customer can have up to 1 sales rep. I've built a summary table where I aggregate the customer activity, and group it by the sales rep, and filter by the customer creation date. This is NOT a cohort (the customers do not all correspond to the scheduled_flights, but rather this is a snapshot of activity for a given period of time) It looks something like this:
Now, in addition to the total number of customers, I'd also like to output an array of those actual customers. The customers field is currently calculated by performing sum(is_customer) as customers and then grouping by the sales rep. To build the array, I've tried to do array_agg(customer_name) which outputs the list of all customer names -- I just need the list of names who also satisfy the condition that is_customer = 1, but I can't use that as a where clause since it would filter out other activity, like scheduled and completed flights for customers that were not new.

This should probably work:
array_agg(case when is_customer = 1 then customer_name end) within group (order by customer_name)
Snowflake should ignore NULL values in the aggregation.

Related

SQL QUERY for sum loans per customer

enter image description here
I need a query that returns all customers whose name contains the string "Will", and their associated total loan values.
Loan totals should be sorted from largest amount to smallest amount and the loans totals column should be called "TotalLoanValue".
Only one record per customer should be returned.
SELECT name, loan_amount
FROM customers, loans
WHERE name LIKE '%WILL%'
I have wrote that query, but I'm having a hard time to figure out how to sum all the loan values per customer
To say first things first:
If you want to ask further questions here, you should please read and follow this: How to create a good example instead of just adding a link.
Otherwise, you will be on risk that your questions will just be closed and you will never get an answer.
To answer your question:
We need to JOIN the two tables by their common column and then build the SUM of all loan amounts with a GROUP BY clause of the customer name.
I didn't follow your link because I wouldn't know if this is spam, so let's say the customer table has a column "id" and the loan table a column "customer_id".
Then your query will look like this:
SELECT c.name, SUM(l.loan_amount)
FROM customers c
JOIN loan l
ON c.id = l.customer_id
WHERE c.name LIKE '%will%'
GROUP BY c.name
ORDER BY SUM(l.loan_amount) DESC, c.name;
The ORDER BY clause makes sure to begin with the customer having the highest sum of loan amounts.
The "c.name" at the end of the ORDER BY clause could be removed if we don't care about the order if different customers have the same sum of loan amounts.
Otherwise, if we use the query as shown, the result will be sorted by the sum of loan amounts first and then with a second priority by the customer name, i.e. will sort customers having the identic sum of loan amounts by their name.
Try out with some sample data here: db<>fiddle

Return defined number of unique values in separate columns all meeting same 'Where' Criteria

We enter overrides based on a unique value from our tables (we have two columns with unique values for each transaction, so may or may not be primary key).
Sometimes we have to enter multiple overrides based on the same set of criteria, so it would be nice to be able to pull multiple unique values in one query that all meet the same criteria in the where clause as our system throws a warning if the same unique id is used for more than one override.
Say we have some customers that were under charged for three months and we need to enter a commission override for each of the three sales people that split the accounts for each month:
I've tried the following code, but the same value gets returned for each column:
select month, customer, product, sum(sales),
any_value(unique_id)unique_id1,
any_value(unique_id)unique_id2,
any_value(unique_id)unique_id3
from table
where customer in (j,k,l) and product = m and year = o
group by 1,2,3;
This will give me a row for each month and customer, but the values in unique_id1, unique_id2 and unique_id3 are the same on each row.
I was able to use:
select month, customer, product, sum(sales),
string_agg(unique_id, "," LIMIT 3)
from table
where customer in (j,k,l) and product = m and year = o
group by 1,2,3;
and split the unique_ids in a spreadsheet but I feel there has to be a better way to accomplish this directly in SQL.
I figure I could use a sub query and select column based on row 1,2,3, but I'm trying to eliminate the redundancy of including the same 'where' criteria in the sub query.
Beow is for BigQuery Standard SQL
I think you second query was close enough to get to something like below
#standardSQL
SELECT month, customer, product, sales,
arr[OFFSET(0)] unique_id1,
arr[SAFE_OFFSET(1)] unique_id2,
arr[SAFE_OFFSET(2)] unique_id3
FROM (
SELECT month, customer, product, SUM(sales) sales,
ARRAY_AGG(unique_id ORDER BY month DESC LIMIT 3) arr
FROM `project.dataset.table`
WHERE customer IN ('j','k','l') AND product = 'm' AND year = 2019
GROUP BY month, customer, product
)

Sql join only on the first match

I don't know how to perform the the following case.
I have the sales info in a table:
Number of Bill (key),
Internal number (key),
Client,
Date (month-year),
Product group,
Product,
Quantities,
Total,
Sales man.
I need to joint this sales tables with the annual forecast sales table that is the next one:
Date (key),
Group product(key),
Sales man (key),
Total.
In each tables the combination of the key is the primary key. I need to add in the sales tables the forecast. For this I need to add the sales of the forecast in the real sale only on the first match of date, group product and sales man, so the total of forecast sales don't get bigger than it is (a sales man can sell the same group product, to the same client, in the same day on multiple times).
.. only on the first match of date, group product and sales man ..
You can use window functions for this, consider using ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... ). First match has row number of 1.
More information and examples (sales!) can be found from MSDN.

Check whether a record in DB with same ID has different value in field

I would like to check whether a record with the same ID (Product_ID) has more than one record with a different date (I would like to check if the product was received in different loots or in just one day), so if it just returns one row it means that it was delivered all the same day, and if more than one result is return its the other way round.
Table PRODUCT
ID (PK) | Product_ID | Type | Deliver_Date | Amount
I've tried with a group by and distinct with no result.
EDIT: Query I had so far...
SELECT DISTINCT ,
count(*)
FROM PUBLIC.product
WHERE product_id = ?
AND deliver_date = ?
HAVING count() = 1
Your use of the DISTINCT keyword is incorrect.
You can either use DISTINCT with a list of fields following it, telling which fields you want to have distinct, or use an aggregate function like COUNT, SUM, MIN etc. together with grouping fields.
E.g.
SELECT DISTINCT Product_Id, Deliver_date
FROM ...
WHERE ...
means "give me all the distinct combinations of product id and deliver date". This is not actually what you need, as DISTINCT Product_id will simply tell you which product IDs there are, but not how many of them and in which dates, and DISTINCT Product_id, Deliver_date will give you all possible combinations of product ID and deliver date, but you'll need to count them manually.
The GROUP BY construct is more informative
SELECT count(*), Product_Id
FROM ...
WHERE ...
GROUP BY Product_Id
Groups the rows by the product id. It tells you how many rows are there for each product ID.
But what if you have several rows with the same product ID and the same date? You'll get a number greater than 1 in that query, but it won't help you because you wanted to distinguish between products delivered on two different days and products delivered all on the same day.
To do this, you need to use COUNT(DISTINCT(Deliver_date)):
SELECT COUNT(DISTINCT(Deliver_date)), Product_ID
FROM ...
WHERE ...
GROUP BY Product_ID
This means:
* Separate the rows into groups by the Product_ID.
* Inside each such group (all the rows that have that Product_ID), find all the distinct Deliver_Date values. Count how many such Delivery_Date values there are inside the group.
So if the product was delivered 10 times within the same day, you'll just have one distinct delivery date for that product ID. The COUNT will return 1. If it was delivered 5 times on day x, and 5 times on day y, then you'll have two distinct delivery dates (x and y), and the COUNT will return 2.
Now, if you want to eliminate all the ones that were all delivered on the same day (the count of distinct dates is 1), you add a HAVING clause to your query:
SELECT COUNT(DISTINCT(Deliver_date)), Product_Id
FROM PRODUCT
GROUP BY Product_Id
HAVING COUNT(DISTINCT(Deliver_date)) > 1
This will give you a list of all the products that were delivered on at least two separate dates.
Of course, if you just want to check if a particular product was delivered on more than one date, it's simpler:
SELECT COUNT(DISTINCT(Deliver_date))
FROM PRODUCT
WHERE Product_ID = ?
This will give you the number of distinct days on which this product was delivered. If it delivered on just one day, the result will be 1. If it was delivered on more than one day, the result will be a number greater than one.
To sum up:
There is no such thing as a DISTINCT,. DISTINCT is always followed by the name of a field or fields that are supposed to be distinct.
But there is a COUNT(DISTINCT(field_name)) which counts how many distinct values the field has in a group or a result set.

Select records for MySQL only once based on a column value

I have a table that stores transaction information. Each transaction is has a unique (auto incremented) id column, a column with the customer's id number, a column called bill_paid which indicates if the transaction has been paid for by the customer with a yes or no, and a few other columns which hold other information not relevant to my question.
I want to select all customer ids from the transaction table for which the bill has not been paid, but if the customer has had multiple transactions where the bill has not been paid I DO NOT want to select them more than once. This way I can generate that customer one bill with all the transactions they owe for instead of a separate bill for each transaction. How would I build a query that did that for me?
Returns exactly one customer_id for each customer with bill_paid equal to 'no':
SELECT
t.customer_id
FROM
transactions t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
Edit:
GROUP BY summarises your resultset.
Caveat: Every column selected must be either 'grouped by' or aggregated in some fashion. As shown by nikic you could use SUM to get the total amount owed, e.g.:
SELECT
t.customer_id
, SUM(t.amount) AS TOTAL_OWED
FROM
transactions AS t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
t is simply an alias.
So instead of typing transactions everywhere you can now simply type t. The alias is not necessary here since you query only one table, but I find them invaluable for larger queries. You can optionally type AS to make it more clear that you're using an alias.
You might try the Group By operator, eg group by the customer.
SELECT customer, SUM(toPay) FROM .. GROUP BY customer