SQL Pivot table, with multiple pivots on criteria - sql

Here is my dataset,
It has a reservation (unique ID) a reservation_dt a fiscal year (all the same year for the most part) month both numerical and name as well as a reservation status then it has total number reserved followed by a counter (basically
1 for each reservation row)
these are my guidelines (they need to be turned into columns by Month)
Requested - Count of All Distinct reservations
Num_Requested (sum total_number_requested by month)
Booked (count of All Distinct reservations status is order created)
Num_Booked (sum total_number_requested by month) where status is order created
Not_Booked (count of All Distinct reservations where status unfulfilled)
Not_Num_Booked, (sum total_number_requested by month where status is unfulfilled)
I am looking to translate this into a pivot table and this is what I've got so far and can't figure out why its not working.
I figured I would turn each of the above guidlines into a column, using either sum(total_number_Requested) or count(total_requested) where reseravation status is ... and such.
I'm open to any other ideas of how to make this simpler and make it work.
SELECT [month_name],
fyear AS fyear,
Requested,
Num_Requested
FROM (SELECT reservation,
reservation_status,
total_number_requested,
fyear,
[month_name],
[month],
total_requested
FROM #temp2) SourceTable
PIVOT (SUM(total_number_requested)
FOR reservation_status IN ([Requested])) PivotNumbRequested PIVOT(COUNT(reservation)
FOR total_requested IN ([Num_Requested])) PivotCountRequested
WHERE [month] = 7
ORDER BY fyear,
[month];

Use conditional expressions to emulate data pivot. Example:
SELECT fyear, Month, Monthname, Count(*) AS CountALL, Sum(total_number_requested) AS TotNum,
Sum(IIf(reservation_status = "Order Created", total_number_Requested, Null)) AS SumCreated
FROM tablename
GROUP BY fyear, Month, MonthName
More info:
SQLServer - Multiple PIVOT on same columns
Crosstab Query on multiple data points

Related

SQL - Counting users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one

Dataset Here is the task : Count users that have multiple transactions and have at least one transaction that has been made within 7 days interval of the other one.
Structure of dataset: Row, userId, orderId, date
Date is formatted as YYYY-MM-DDTHH:MM:SS Example: 2016-09-16T11:32:06
I have completed the first part (counting users with multiple transactions), but I do not know how to do the second part in the same query. I will be thankful for help.
Here is the console:
query = '''
SELECT COUNT(*)
FROM
(SELECT userId FROM `dataset` GROUP BY userId HAVING COUNT(orderId) > 1)
'''
project_id = 'acdefg'
df = pd.io.gbq.read_gbq(query, project_id=project_id, dialect='standard')
display(df)
To solve this issue you want to be able to compare each record to a previous record: when was the last order from the same user. This hints to the use of partitions and window functions, in this case LAG.
A possible way to solve the problem is to organise records per user and order them by orderDate and then for each record have a look at the record just above:
WITH intermediate_table AS (
SELECT
userId,
orderDate,
LAG(orderDate)
OVER (PARTITION BY userId ORDER BY orderDate) -- this is where we pick the orderDate of the record right above, once the orders are organized by userId and ordered by orderDate
FROM `dataset.table`
)
SELECT userId
FROM intermediate_table
WHERE DATE_DIFF(orderDate, previous_order, DAY) <= 7
GROUP BY userId
Once orderDate and previous_order info are gathered in the same record, it's easy to compare them and see if there is less than 7 days between the two.
(GROUP BY is used for returning userIds only once in the resulting table)
This may be what you need:
-- for each order calculate the days since that customer's last order
order_profiler AS (
SELECT
orderId,
orderDate,
custId,
DATE_DIFF(orderDate, LAG(orderDate) OVER (PARTITION BY custId ORDER BY orderDate), day) AS order_latency_days,
FROM
`dataset.table`
)
SELECT
custId,
FROM order_profiler
WHERE order_latency_days <= 7
GROUP BY custId

IF in sql to choose which values to select

I am trying to use an IF or CASE statement in sql to choose when to select a value in a column. Essentially I have some data in a table like so:
My goal is to see which items are ordered multiple weeks in a row by the same customer. I have 1 month of dates, but I can do 7 separate queries with 1 query for each day of the week. I'm trying to do something like:
Select item, date, customer, truck
If customer, item combo appears in multiple weeks
Please let me know if you have any idea how I can do this!
Assuming you have at most one row per week per customer and item (as in the sample data), you can use lead() and lag(). The following assumes that you mean exactly 7 days apart:
select t.*
from (select t.*,
lag(orderdate) over (partition by customer, itemid order by orderdate) as prev_orderdate,
lead(orderdate) over (partition by customer, itemid order by orderdate) as next_orderdate
from t
) t
where prev_orderdate = orderdate - interval '7 day' or
next_order_date = orderdate + interval '7 day';
Note that date/time functionality is highly database dependent, so you might have to adjust for your database functions.

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

I'm trying to calculate the difference between two weeks but I'm getting a weird peak when plotting the results ( SQL / BigQuery )

so I have this daily table that contains the number of visitors per store, everyday.
My tables columns are:
Date
Store
Number_of_Visitors
Views : number of views of the stores' ads.
So I first started with aggregating my table to a weekly table so that I can calculate the variance between a week and the next one.
Here is how I defined variance:
Variance = `Number Of Visitors in WEEK N+1 / Number of Visitors in WEEK N
I wrote the following query to do that (new table called: weekly)
SELECT
year_week,
min(date) as date,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS
FROM (
SELECT
*,
CONCAT(cast((extract(YEAR from date)), LPAD(cast((extract(WEEK from date)) as string), 2, '0') ) AS year_week
FROM `my-project`)
GROUP BY
year_week, Store
ORDER BY year_week
Then, in order to calculate the variance, I used the following query as well:
SELECT
base.*,
((base.TOTAL_VISITORS-lw.TOTAL_VISITORS)/lw.TOTAL_VISITORS) AS VAR_FF,
FROM
`weekly` base
JOIN (
SELECT
* EXCEPT (date),
DATE_ADD(DATE(TIMESTAMP(date)), INTERVAL 1 Week)AS n_date
FROM
`weekly` ) lw
ON
base.date = lw.n_date
AND base.Store= lw.Store
When I'm plotting the variance (VAR_FF) using Data Studio and I'm getting the following plot that doesnt 't seem to be making sense with the high peak in the middle;
I am thinking your code should look like this:
SELECT date_trunc(date, week) as year_week,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS,
(1 -
(LAG(SUM(Number_Of_Visitors)) OVER (PARTITION BY Store ORDER BY MIN(date) /
SUM(Number_Of_Visitors)
)
) as VAR_FF,
FROM`my-project`
GROUP BY year_week, Store
ORDER BY year_week;
I'm not sure what your weird calculations for calculating the week are really doing. This is based on the previous week in the data.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)