IF in sql to choose which values to select - sql

I am trying to use an IF or CASE statement in sql to choose when to select a value in a column. Essentially I have some data in a table like so:
My goal is to see which items are ordered multiple weeks in a row by the same customer. I have 1 month of dates, but I can do 7 separate queries with 1 query for each day of the week. I'm trying to do something like:
Select item, date, customer, truck
If customer, item combo appears in multiple weeks
Please let me know if you have any idea how I can do this!

Assuming you have at most one row per week per customer and item (as in the sample data), you can use lead() and lag(). The following assumes that you mean exactly 7 days apart:
select t.*
from (select t.*,
lag(orderdate) over (partition by customer, itemid order by orderdate) as prev_orderdate,
lead(orderdate) over (partition by customer, itemid order by orderdate) as next_orderdate
from t
) t
where prev_orderdate = orderdate - interval '7 day' or
next_order_date = orderdate + interval '7 day';
Note that date/time functionality is highly database dependent, so you might have to adjust for your database functions.

Related

SQL - Counting users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one

Dataset Here is the task : Count users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one.
Structure of dataset: Row, userId, orderId, date
Date is formatted as YYYY-MM-DDTHH:MM:SS Example: 2016-09-16T11:32:06
I have completed the first part (counting users with multiple transactions), but I do not know how to do the second part in the same query. I will be thankful for help.
Here is the console:
query = '''
SELECT COUNT(*)
FROM
(SELECT userId FROM `dataset` GROUP BY userId HAVING COUNT(orderId) > 1)
'''
project_id = 'acdefg'
df = pd.io.gbq.read_gbq(query, project_id=project_id, dialect='standard')
display(df)
To solve this issue you want to be able to compare each record to a previous record: when was the last order from the same user. This hints to the use of partitions and window functions, in this case LAG.
A possible way to solve the problem is to organise records per user and order them by orderDate and then for each record have a look at the record just above:
WITH intermediate_table AS (
SELECT
userId,
orderDate,
LAG(orderDate)
OVER (PARTITION BY userId ORDER BY orderDate) -- this is where we pick the orderDate of the record right above, once the orders are organized by userId and ordered by orderDate
FROM `dataset.table`
)
SELECT userId
FROM intermediate_table
WHERE DATE_DIFF(orderDate, previous_order, DAY) <= 7
GROUP BY userId
Once orderDate and previous_order info are gathered in the same record, it's easy to compare them and see if there is less than 7 days between the two.
(GROUP BY is used for returning userIds only once in the resulting table)
This may be what you need:
-- for each order calculate the days since that customer's last order
order_profiler AS (
SELECT
orderId,
orderDate,
custId,
DATE_DIFF(orderDate, LAG(orderDate) OVER (PARTITION BY custId ORDER BY orderDate), day) AS order_latency_days,
FROM
`dataset.table`
)
SELECT
custId,
FROM order_profiler
WHERE order_latency_days <= 7
GROUP BY custId

How to count the number of orders in the first hour in SQL

Let's say have a table like:
DateTime - When the order was placed
CustomerId - The id of the customer
<other fields>
How do I create a query that will tell me the total number of orders a customer placed in the hour after creating their first order.
I'm currently finding the first order per customer, joining it back to the original table, and doing a countif but wondering if there is a way to do it in a single step.
In Standard SQL, you can use logic like this:
select customerid, count(*)
from (select t.*,
min(datetime) over (partition by customerid) as min_datetime
from t
) t
where datetime < min_datetime + interval '1 hour'
group by customerid;
Date/time functions differ significantly among databases, but this general structure should work in almost every database.

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

FirebirdSQL Unique & Max (or MaxValue)

I want to run a report from sales of any customer that has ordered in the last two years.
I can run a report of all invoices dated within two years then remove duplicates in excel, but I would rather do it directly within (Firebird) SQL
I can use a WHERE date < 1 Jan 2015 (2 years or thereabours), but how do I get it to only show the customer once? I thought if I used MAX(Date) therefore showing the most recent date in that two year period. Where am I going wrong? I believe I need to use a UNIQUE() function like UNIQUE(ORDERCUSTOMER) within the SELECT clause.
SELECT
FINANCIALSALESINVOICES.TRANSACTIONDATE,
FINANCIALSALESINVOICES.INVOICECUSTOMER,
FINANCIALSALESINVOICES.ORDERCUSTOMER,
FINANCIALSALESINVOICES.INVOICENUMBER,
FINANCIALSALESINVOICES.SOURCENUMBER,
MAX(FINANCIALSALESINVOICES.TRANSACTIONDATE)
FROM FINANCIALSALESINVOICES
WHERE (FINANCIALSALESINVOICES.TRANSACTIONDATE>={d '2015-01-01'})
ORDER BY FINANCIALSALESINVOICES.INVOICECUSTOMER, FINANCIALSALESINVOICES.TRANSACTIONDATE
I did having it showing the max date for each instance of invoice in the past two years, but now can't fine that file or replicate it.
One approach is to use a subquery in the WHERE clause which checks for the most recent invoice:
SELECT
t.TRANSACTIONDATE,
t.INVOICECUSTOMER,
t.ORDERCUSTOMER,
t.INVOICENUMBER,
t.SOURCENUMBER
FROM FINANCIALSALESINVOICES t
WHERE t.TRANSACTIONDATE >= date '2015-01-01' AND
t.TRANSACTIONDATE = (SELECT MAX(f.TRANSACTIONDATE)
FROM FINANCIALSALESINVOICES f
WHERE t.ORDERCUSTOMER = f.ORDERCUSTOMER AND
f.TRANSACTIONDATE >= date '2015-01-01')
ORDER BY t.INVOICECUSTOMER,
t.TRANSACTIONDATE
With Firebird 3 you can use row_number() to assign a unique value to each row within a group (partition), that value can then be filtered on:
select
a.TRANSACTIONDATE,
a.INVOICECUSTOMER,
a.ORDERCUSTOMER,
a.INVOICENUMBER,
a.SOURCENUMBER
from (
select
TRANSACTIONDATE,
INVOICECUSTOMER,
ORDERCUSTOMER,
INVOICENUMBER,
SOURCENUMBER,
row_number() over (partition by INVOICECUSTOMER, order by TRANSACTIONDATE desc) as rownr
from FINANCIALSALESINVOICES
where TRANSACTIONDATE >= date '2015-01-01'
) a
where a.rownr = 1
order by a.INVOICECUSTOMER, a.TRANSACTIONDATE
See also Window (Analytical) Functions in the Firebird 3 release notes.

Multiple Counts Over Multiple Dates

I am essentially doing the following query (edited):
Select count(orders)
From Orders_Table
Where Order_Open_Date<=##/##/####
and Order_Close_Date>=##/##/####
Where the ##/##/##### is the same date. So in essence the number of 'open' orders for any given day. However I am wanting this same count for every single day for a year and don't want to write a separate query for each day for the whole year. I'm sorry this is probably really simple but I am new to SQL and I guess I don't know how to search for an answer to this question since my searches have come up with nothing. Thanks for any help you can offer.
why not
select Order_Date, count(orders) from Orders_Table group by Order_Date
and for last year
select Order_Date, count(orders) from Orders_Table where Order_Date > DATE_SUB(CURDATE(), INTERVAL 1 YEAR) group by Order_Date;
SELECT CONVERT(VARCHAR, Order_Date, 110), count(orders)
FROM Orders_Table
WHERE Order_Date = BETWEEN #A AND #B
GROUP BY CONVERT(VARCHAR, Order_Date, 110)
If you want to have every day of the year, including those with no orders, you will need to generate a temporary table or similar containing every date in the range and left/right join it to the Orders_Table data. This depends upon which RDBMS you're using. In SQL Server I have done this using a user defined function which returns a table variable.