Aggregated data from transactional table for sparklines - sql

I'm working on an Ruby-on-Rails app which contains a list type of report. Two columns within that table are an aggregation from a transactional table.
So let's say we have these two tables:
**items**
id
name
group
price
**transactions**
id
item_id
type
date
qty
These two tables are connected with item_id in the transactions table.
Now I want to show some set of lines within the items table in a table and have two calculated columns within that table:
Calculated column 1 (Sparkline data):
Sparkline for transactions for the item with type="actuals" for the last 12 months. The result from the database should be text with aggregated qty for each month seperated by comma. Example:
15,20,0,12,44,33,6,4,33,23,11,65
Calculated column 2 (6m total sale):
Total qty for the item multiplied by sale for the last 6 months.
So the results would how columns like these:
Item name - Sparkline data - 6m total sale
So the result could by many thousand of lines, but would probably be paged.
So the question is, how is the most straightforward way of doing this in Rails models which doesn't sacrifice to much performance? Although this is a ruby-on-rails question it might contain more of a sql type solution.

The core sql could be something similar:
select
i.id,
i.name,
y.sparkline,
i.price*s.sum totalsale6m
from
items i left join
(select
x.item_id,
GROUP_CONCAT(x.sumqtd order by datemonth asc SEPARATOR ',') sparkline
from
(select
t.item_id,
date_format(date, '%m') datemonth,
sum(qtd) sumqtd
from
transactions t
where
t.type='actuals' and
t.date>date_sub(now(), interval 1 year)
group by
t.item_id, datemonth
) x
group by
x.item_id
) y on i.id=y.item_id
left join
(select
t.item_id,
sum(qtd) sumqtd
from
transactions t
where
t.date>date_sub(now(), interval 6 month)
group by
t.item_id
) s on i.id=s.item_id
group by
i.id, i.name
A few comments:
I wasn't able to test it without real data.
If there are gaps in the sales, I mean no sales in a given month, then the list will not contain 12 elements. In this case you need to adjust x,y tables
If you need the result only for a given few items, then probably you can put the item id filter deeper into the subqueries sparing time.

Related

Query two tables joined over a third table with the same foreign key?

I have a Postgres database schema groceries.
There are two tables purchases 19 and 20 connected over a third one categories.
I can join every table alone with categories without problem.
For calculating the year change I need 19 and 20 together.
It seems the problem is that the third table categories has got only one foreign key for both tables. Thus it return every time a col with zeros because there is no match for one table. Maybe I am wrong.
Any suggestions to query the tables?
More info below.
The groceries database has a subset dairies: 'whole milk','yogurt', 'domestic eggs'.
There are no clear primary keys.
I share the database file with this link:
https://drive.google.com/drive/folders/1BBXr-il7rmDkHAukETUle_ZYcDC7t44v?usp=sharing
I want to answer:
For each month of 2020, what was the percentage increase or decrease in total monthly dairy purchases compared to the same month in 2019 (i.e., the year_change)?
How can I do this?
I have tried different queries along this line:
SELECT
a.month,
COUNT(a.purchaseid) as sales_2020,
COUNT(b.purchase_id) as sales_2019,
ROUND(((CAST(COUNT(purchaseid) as decimal) /
(SELECT COUNT(purchaseid)FROM purchases_2020)) *100),2)
as market_share,
(COUNT(a.purchaseid) - COUNT(b.purchase_id) ) as year_change
FROM purchases_2020 as a
Left Outer Join categories as cat ON a.purchaseid = cat.purchase_id
Left Outer Join purchases_2019 as b ON cat.purchase_id = b.purchase_id
WHERE cat.category in ('whole milk','yogurt', 'domestic eggs')
GROUP BY a.month
ORDER BY a.month
;
It gives me either no result or the result above with an empty sales_2019 column.
The expected result is a table
with the monthly dairy sales for 2020, the montly market share of dairies of all products in 2020, and the monthly year change between 2019 and 2020 in percentage.
How can I calculate the year change?
Thanks for your help.
%%sql
postgresql:///groceries
with p2019Sales as (
select
month,
count(p.purchase_id) as total_sales
from purchases_2019 p
left join categories c
using (purchase_id)
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by month
order by month
),
mkS as (
select
cast(extract(month from fulldate::date)as int) as month,
count(*) as total_share
from purchases_2020
group by month
order by month
),
p2020Sales as (
select
cast(extract(month from fulldate::date)as int) as month,
count(p.purchaseid) as total_sales,
round(count(p.purchaseid)*100::numeric/ m.total_share,2) as market_share,
sum(count(*)) over() as tos
from purchases_2020 p
left join categories c
on p.purchaseid = c.purchase_id
left join mks m
on cast(extract(month from p.fulldate::date)as int) = m.month
where c.category in ('whole milk', 'yogurt' ,'domestic eggs')
group by 1,m.total_share
order by 1,m.total_share
),
finalSale as (
select
month,
p2.total_sales,
p2.market_share,
round((p2.total_sales - p1.total_sales)*100::numeric/p1.total_sales,2) as year_change
from p2019Sales p1
inner join p2020Sales p2
using(month)
)
select *
from finalSale
The answer of user18262778 is excellent.
but as Jeremy Caney is stating:
" add additional details that will help others understand how this addresses the question asked."
I deliver some details.
My goal:
get the output I want in one query
My problem:
The query is long and complicated.
There are several approaches to the problem:
joins
subqueries
All are prone to circular dependencies.
The subqueries and joins produce results,but discard data necessary to move on further towards the final result
The solution:
The with statement allows to compute the aggregation and reference this by name within the query.
If you know it is the WITH statement, then there is of course a lot of info on the web. The description below summarises exactly the benefits of the given solution in general.
"In PostgreSQL, the WITH query provides a way to write auxiliary statements for use in a larger query. It helps in breaking down complicated and large queries into simpler forms, which are easily readable. These statements often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query.
The WITH query being CTE query, is particularly useful when subquery is executed multiple times. It is equally helpful in place of temporary tables. It computes the aggregation once and allows us to reference it by its name (may be multiple times) in the queries.
The WITH clause must be defined before it is used in the query."
PostgreSQL - WITH Clause

I want NAV price as per (Today date minus 1) date

I have two tables. One is NAV where product daily new price is updated. Second is TDK table where item wise stock is available.
Now I want to get a summery report as per buyer name where all product wise total will come and from table one latest price will come.
I have tried below query...
SELECT dbo.TDK.buyer, dbo.NAV.Product_Name, sum(dbo.TDK.TD_UNITS) as Units, sum(dbo.TDK.TD_AMT) as 'Amount',dbo.NAV.NAValue
FROM dbo.TDK INNER JOIN
dbo.NAV
ON dbo.TDK.Products = dbo.NAV.Product_Name
group by dbo.TDK.buyer, dbo.NAV.Product_Name, dbo.NAV.NAValue
Imnportant: Common columns in both tables...
Table one NAV has column as Products
Table two TDK has column as Product_Name
If I have NAValue 4 records for one product then this query shows 4 lines with same total.
What I need??
I want this query to show only one line with latest NAValue price.
I want display one more line with Units*NAValue (latest) as "Latest Market Value".
Please guide.
What field contains the quote date? I am assuming you have a DATIME field, quoteDate, in dbo.NAV table and my other assumption is that you only store the Date part (i.e. mid-night, time = 00:00:00).
SELECT
t.buyer,
n.Product_Name,
sum(t.TD_UNITS) as Units,
sum(t.TD_AMT) as 'Amount',
n.NAValue
FROM dbo.TDK t
INNER JOIN dbo.NAV n
ON t.Products = n.Product_Name
AND n.quoteDate > getdate()-2
group by t.buyer, n.Product_Name, n.NAValue, n.QuoteDate
GetDate() will give you the current date and time. Subtracting 2 would get it before yesterday but after the day before yesterday.
Also, add n.quoteDate in your select and group by. Even though you don't need it, in case that one day you have a day of bad data with double record in NAV table, one with midnight time and another with 6 PM time.
Your code looks like SQL Server. I think you just want APPLY:
SELECT t.buyer, n.Product_Name, t.TD_UNITS as Units, t.TD_AMT as Amount, n.NAValue
FROM dbo.TDK t CROSS APPLY
(SELECT TOP (1) n.*
FROM dbo.NAV n
WHERE t.Products = n.Product_Name
ORDER BY ?? DESC -- however you define "latest"
) n;

How to command google bigquery to show all row of time series data (don't omit row)

I use an easy command for showing sales volume by store by goods item and by month in google Big Query but there are omitted month as some items were not bought. So, I would to know is there coding for all months. enter image description here
Now I try to solve the problem by build time index and item_code and cross join with store but it must be massive row if I apply this with all item code.
SELECT bp_no,sales_month,sales_year,sum(report_qty) AS volume
FROM my_table
WHERE sales_year between 2016 and 2017 and bp_no is not null and report_qty is not null
GROUP BY bp_no,sales_month,sales_year
ORDER BY bp_no,sales_year, sales_month
enter image description here
You can create a temporary mapping table using a WITH statement that will hold your metadata for year, month, store, and item. It is slightly bootstrapped, but it should return your expected result.
QUERY
WITH META AS (
SELECT DISTINCT
YEAR,
MONTH,
STORE,
ITEM,
FROM
`project.dataset.your_table`,
UNNEST(GENERATE_ARRAY(1,12)) AS MONTH,
UNNEST(GENERATE_ARRAY(2016,2017) AS YEAR
ORDER BY 1,2
)
SELECT
META.YEAR,
META.MONTH,
META.STORE,
META.ITEM,
SUM(DATA.QUANTITY)
FROM
META
LEFT JOIN `project.dataset.your_table` AS DATA ON META.MONTH = DATA.MONTH
AND META.YEAR = DATA.YEAR
AND META.ITEM = DATA.ITEM
AND META.STORE = DATA.STORE
GROUP BY
1,2,3,4
You can obviously modify the upper bound and lower bound in the GENERATE_ARRAY function on the year if you want to look at a different time span.
It also assumes your month are in the form 1,2,3,...,12 and that both YEAR and MONTH have an int64 type. If not, you will have to CAST these filed to INT64.
OUTPUT
ADDITIONAL INFORMATION
More information about GENERATE_ARRAY
More information aboutUNNEST
Below is for BigQuery Standard SQL
#standardSQL
SELECT
bp_no,
sales_month,
sales_year,
SUM(IFNULL(report_qty, 0)) AS volume
FROM (SELECT DISTINCT bp_no FROM `project.dataset.my_table`),
UNNEST(GENERATE_ARRAY(1, 12)) sales_month,
UNNEST(GENERATE_ARRAY(2016, 2017)) sales_year
LEFT JOIN `project.dataset.my_table`
USING(bp_no, sales_month, sales_year)
GROUP BY bp_no, sales_month, sales_year

How to calculate difference between two rows in a date interval?

I'm trying to compare data from an Access 2010 database based on a date interval. Example I have items from various purchase orders and I want to maintain the history of these item's delivery to a warehouse. So my purchase order has a request for a quantity of 10 of a material, for example, and it can be partially delivered in many deliveries and I want to know how this delivery varied in a date interval. To fill the date field the criteria used is the following: if the item had an update in the QtyPending field, I copy the current row deactivating it with a booelan field, create a new entry with the current update date updating the QtyPending field, so the active record is the actual state of the item. So I have a table that holds informations about these items like that
PO POItem QtyPending Date Active
4500000123 10 10 01/09/2014 FALSE
4500000123 10 8 05/09/2014 TRUE
4500000122 30 5 03/09/2014 FALSE
4500000122 30 1 04/09/2014 TRUE
With this example, for the first item, it means that from date 01/09 to 04/09 the QtyPending field didn't suffer a variation, meaning that the supplier didn't make any delivery to me, but from 01/09 to 05/08 he delivered me a qty of 2 of a material. For the second one, from date 03/09 to 04/09 the supplier delivered me a qty of 4 of a material. So, if I were to be making a report query from 02/09/2014 to 04/09/2014, the expected output is like this:
PO POItem QtyDelivered
4500000123 10 0
4500000122 30 4
And a report from 31/08/2014 to 10/09/2014, would have this output
PO POItem QtyDelivered
4500000123 10 2
4500000122 30 4
I'm not coming up with a query to make this report. Can anyone help me?
There are many ways of solving this. The easiest one would be to simply make a query of all the necessary records between two dates, loop over them and insert into a temporary table the result. This temporary table can then be the source of your report. A lot of people will scream at you for not using a big query instead but getting the result that you want in the fastest and simplest way should be your priority.
Your problem with your schema is that you don't have the QtyDelivered stored for each record. If you would have it, it would be an easy thing to sum over it in order to get needed result. By not storing this value, you have transformed a simple and fast query into a much harder and slower one because you need to recalculate this value in some way or other and you must do this without forgetting the fact that it's possible to have more than two records.
For calculating this value, you can either use a sub-query to retrieve the value from the previous row or a Left join do to the same. Once you have this value, you can subtract these two to get the needed difference; allowing for the possibility of Null value if there is no previous row. Once you have these values, you can now sum over them to get the final result with a Group By. Notice that in order to perform these calculations, you need to have one or two more levels of subquery. The first query should be something like:
Select PO, POItem, QtyPending, (Select Top 1 QtyPending from MyTable T2 where T1.PO = T2.PO and T2.Date < T1.Date And (T2.Date between #Date1 and #Date2) Order by T2.Date Desc) as QtyPending2 from MyTable T1 Where T1.Date between #Date1 and #Date2) ...
With this as either another subquery or as a View, you can then compute the desired difference by comparing the values of QtyPending and QtyPending2; without forgetting that QtyPendin2 may be Null. The remaining steps are easy to do.
Notice that the above example is for SQL-Server, you might have to change it a little for Access. In any case, you can find here many examples on how to compare two rows under Access. As noted earlier, you can also use a Left Join instead of a subquery to compare your rows.
I came up with this query that solved the problem, it wasn't that simple
SELECT
ItmDtIni.PO
,ItmDtIni.POItem AS [PO Item]
,ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) AS [Qty Delivered]
,ROUND((ItmDtIni.QtyPending - ItmDtEnd.QtyPending) * ItmDtEnd.Price, 2) AS [Value delivered(US$)]
//Filtering subqueries to bring only the items in the date interval to make a self join
FROM (((SELECT
PO
,POItem
,QtyPending
,MIN(Date) AS MinDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending) AS ItmDtIni
//Self join filtering to bring only items in the date interval with the previously filtered table
INNER JOIN (SELECT
PO
,POItem
,QtyPending
,Price
,MAX(Date) AS MaxDate
FROM Item
WHERE Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy')
GROUP BY
PO
,POItem
,QtyPending
,Price) AS ItmDtEnd
ON ItmDtIni.PO = ItmDtEnd.PO
AND ItmDtIni.POItem = ItmDtEnd.POItem)
INNER JOIN PO
ON ItmDtEnd.PO = PO.Numero)
WHERE
//Showing only items that had a variation in the date interval
ROUND(ItmDtIni.QtyPending - ItmDtEnd.QtyPending, 3) <> 0
//Anchoring min date in the interval for each item found by the first subquery
AND ItmDtIni.MinDate = (SELECT MIN(Item.Date)
FROM Item
WHERE
ItmDtIni.PO = Item.PO
AND ItmDtIni.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))
//Anchoring max date in the interval for each item found by the second subquery
AND ItmDtEnd.MaxDate = (SELECT MAX(Item.Date)
FROM Item
WHERE
ItmDtEnd.PO = Item.PO
AND ItmDtEnd.POItem = Item.POItem
AND Date BETWEEN FORMAT(begin_date, 'dd/mm/yyyy') AND FORMAT(end_date, 'dd/mm/yyyy'))

analyze range and if true tell me

I want to see if the price of a stock has changed by 5% this week. I have data that captures the price everyday. I can get the rows from the last 7 days by doing the following:
select price from data where date(capture_timestamp)>date(current_timestamp)-7;
But then how do I analyze that and see if the price has increased or decreased 5%? Is it possible to do all this with one sql statement? I would like to be able to then insert any results of it into a new table but I just want to focus on it printing out in the shell first.
Thanks.
It seems odd to have only one stock in a table called data. What you need to do is bring the two rows together for last week's and today's values, as in the following query:
select d.price
from data d cross join
data dprev
where cast(d.capture_timestamp as date = date(current_timestamp) and
cast(dprev.capture_timestamp as date) )= cast(current_timestamp as date)-7 and
d.price > dprev.price * 1.05
If the data table contains the stock ticker, the cross join would be an equijoin.
You may be able to use query from the following subquery for whatever calculations you want to do. This is assuming one record per day. The 7 preceding rows is literal.
SELECT ticker, price, capture_ts
,MIN(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS min_prev_7_records
,MAX(price) OVER (PARTITION BY ticker ORDER BY capture_ts ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS max_prev_7_records
FROM data