SQL combining GROUP BY and SUM - sql

I need help with SQL. I have an sqlite table like so;
CREATE TABLE mytable (datetime DATE, type TEXT, amount REAL)
I need a query which would sum up amount for each type AND year-month (as you can see the year is also extracted since the data can span several years). I've come to something half-way, but I'm a bit rusty on SQL.
sqlite> SELECT strftime('%Y',datetime) AS year, strftime('%m',datetime) AS month, type, amount FROM mytable ;
2009|06|Type1|-1000.0
2009|06|Type1|-100.0
2009|06|Type2|-100.0
2009|07|Type1|-214.91
2009|07|Type2|-485.0
I've tried a number of combinations of SUM and GROUP BY on my query above but none of them does what I want. What I want is a result something like:
2009|06|Type1|-1100.0
2009|06|Type2|-100.0
2009|07|Type1|-214.91
2009|07|Type2|-485.0
Yes, type should be a foreign key, I simplified things to make it easier to ask the question :)

SELECT strftime('%Y',datetime) AS year,
strftime('%m',datetime) AS month,
type,
Sum(amount) As Amount
FROM mytable
Group By 1, 2, 3
Note
Some DBs don't support group by index so you would have to do this.
SELECT strftime('%Y',datetime) AS year,
strftime('%m',datetime) AS month,
type,
Sum(amount) As Amount
FROM mytable
Group By strftime('%Y',datetime),
strftime('%m',datetime),
type

Related

Delete duplicates using dense rank

I have a sales data table with cust_ids and their transaction dates.
I want to create a table that stores, for every customer, their cust_id, their last purchased date (on the basis of transaction dates) and the count of times they have purchased.
I wrote this code:
SELECT
cust_xref_id, txn_ts,
DENSE_RANK() OVER (PARTITION BY cust_xref_id ORDER BY CAST(txn_ts as timestamp) DESC) AS rank,
COUNT(txn_ts)
FROM
sales_data_table
But I understand that the above code would give an output like this (attached example picture)
How do I modify the code to get an output like :
I am a beginner in SQL queries and would really appreciate any help! :)
This would be an aggregation query which changes the table key from (customer_id, date) to (customer_id)
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(txn_ts) as count_purchase_dates
FROM
sales_data_table
GROUP BY
cust_xref_id
You are looking for last purchase date and count of distinct transaction dates ( like if a person buys twice, it should be considered as one single time).
Although you mentioned you want count of dates but sample data shows you want count of distinct dates - customer 284214 transacted 9 times but distinct will give you 7.
So, here is the SQL you can use to get your result.
SELECT
cust_xref_id,
MAX(txn_ts) as last_purchase_date,
COUNT(distinct txn_ts) as count_purchase_dates -- Pls note distinct will count distinct dates
FROM sales_data_table
GROUP BY 1

SQL to find unique counts between two date fields

I was reading this but can't manage to hack it to work on my own problem.
My data has the following fields, in a single table in Postgres:
Seller_id (varchar) (contains_duplicates).
SKU (varchar) (contains duplicates).
selling_starts (datetime).
selling_ends (datetime).
I want to query it so I get the count of unique SKUs on sale, per seller, per day. If there are any null days I don't need these.
I've tried before querying it by using another table to generate a list of unique "filler" dates and then joining it to where the date is more than the selling_starts and less than the selling_ends fields. However, this is so computationally expensive that I get timeout errors.
I'm vaguely aware there are probably more efficient ways of doing this via with statements to create CTEs or some sort of recursive function, but I don't have any experience of this.
Any help much appreciated!
try this :
WITH list AS
( SELECT generate_series(date_trunc('day', min(selling_starts)), max(selling_ends), '1 day') AS ref_date
FROM your_table
)
SELECT seller_id
, l.ref_date
, count(DISTINCT sku) AS sku_count
FROM your_table AS t
INNER JOIN list AS l
ON t.selling_starts <= l.ref_date
AND t.selling_ends > l.ref_date
GROUP BY seller_id, l.ref_date
If your_table is large, you should create indexes to accelerate the query.

SQL How to add a column to a table that's the sum of a category's quantity

I have a data table shaped like below:
What I'm looking to do with SQL is add a column that will be the sum for the total category by month without removing any rows. For Example,
My goal is to take this category data and do some calculations with it like dividing it by the Qty and seeing how it changes over time.
What I've tried to do is use GROUP BY the category and date but that ends up with me losing the Item level data which I want to compare the Category level data to.
I also tried doing something like this
SELECT
Item, Category, Date, Qty, (sum(QTY) from TABLE)
FROM TABLE
but that only gives the sum of the QTY for the whole column not split out by Month/Year and Category.
Does anyone know what might help? I'm relatively new to using SQL so I hope I explained my question properly.
Use window functions:
select t.*,
sum(qty) over (partition by category, date) as category_sum
from t;
This assumes that date is really just the month and year. If it is the exact date, you need to extract the month and year from it.

Bigquery - how to aggregate data based on conditions

I have a simple table like the following, which has product, price, cost and category. price and cost can be null.
And this table is being updated from time to time. Now I want to have a daily summary of the table content grouped by category, to see in each category, how many products that has no price, and how many has a price, and how many products has a price that is higher than the cost, so the result table would look like the following:
I think I can get a query running everyday by setting up query re-run schedule in bigQuery, so I can have three rows of data appended to the result table everyday.
But the problem is, how can I get those three rows? I know I can group by, but how do I get the count with those conditions like not null, larger than, etc.
You seem to want window functions:
select t.*
countif(price is nuill) over (partition by date) as products_no_price,
countif(price <= cost) over (partition by date) as products_price_lower_than_cost
from t;
You can run this code on the table that has date column. In fact, you don't need to store the last two columns.
If you want to insert the first table into the second, then there is no date and you can simply use:
select t.*
countif(price is nuill) over () as products_no_price,
countif(price <= cost) over () as products_price_lower_than_cost
from t;

Make table ID appear as a column and select across all tables

I've been requested by my superiors to write a query that will search every table in a database (each representative of a road and their total counts of traffic) and take the total counts by hour of motorcycles. Here's what I have so far whilst testing on one table:
WITH
totalCount AS
(
SELECT DATEDIFF(dd,0,event_time) AS DaySerial,
DATEPART(dd,event_time) AS theDay,
DATEDIFF(mm,0,event_time) AS MonthSerial,
DATEPART(mm,event_time) AS MonthofYear,
DATEDIFF(hh,0,event_time) AS HourSerial,
DATEPART(hh,event_time) AS Hour,
COUNT(*) AS HourlyCount,
DATEDIFF(yy,0,event_time) AS YearSerial,
DATEPART(yy,event_time) AS theYear
FROM [RUD].dbo.[10011E]
WHERE length <='1.7'
GROUP BY DATEDIFF(hh,0,event_time),
DATEPART(hh,event_time),
DATEDIFF(dd,0,event_time),
DATEPART(dd,event_time),
DATEDIFF(mm,0,event_time),
DATEPART(mm,event_time),
DATEDIFF(yy,0,event_time),
DATEPART(yy,event_time)
)
SELECT
theYear,
MonthofYear,
theDay,
Hour,
AVG(HourlyCount) AS Avg_Count
FROM
totalCount
GROUP BY
theYear,
MonthofYear,
theDay,
Hour
ORDER BY
theYear,
MonthofYear,
theDay,
Hour
Now I'm sure some of this is redundant or not needed, that's ok for now (I'm new to SQL btw, which is why some of this will be redundant). Basically as it stands, I list the year, month, date, hour and hourly count of motorcycles for one road. Now my two questions:
How do I take this query and make it so that it searches across every single table in the RUD database? Do I just need to list them all and UNION them, or is there a quicker way?
I realise if I search through every table gathering only the above (year, month, day, hour, hourly count) I will end up with the right data but with no way to distinguish which road all the counts are coming from. Is there a way to select the table ID (in this example, 10011E is the ID, and is the assigned name for a specific road) and place it in a column next to the rows that were selected from it?
If anyone needs clarification on what I mean, please let me know! Thanks!
One option would be to use UNION ALL and add an additional column for which source. You'll have to write out each of your tables in this case, but it's perhaps your fastest option:
SELECT ID, 'YourTable' TableName
FROM YourTable
UNION ALL
SELECT ID, 'YourOtherTable'
FROM YourOtherTable
....
Alternatively, dynamic sql could produce you the same results -- you might not have to type out all your table names, but it comes with a performance hit.