SQL query question - sql

I'm trying to do something in a query that I've never done before. it probably requires variables, but i've never done that, and I'm not sure that it does.
What I want is to get a list of sales, grouped first by affiliate, then by it's month.
I can do that, but here's the twist... I don't want the month, but month 1, month 2, month 3...
And those aren't Jan, feb, march, but the number of months since the day of first sale.
Is this possible in a query at all, or do I need to do this in my code.
Oh, mysql 5.1.something...

Sure, just write an expression in SQL that generates the number of months since the first sale (Do you mean the first sale for that afiliate? If so, you'll need a subquery)
And since you say you want a list of sales, I assume you don't really want to "Group By" affilaite and monthcount, you just want to Sort, or Order By those values)
If you wanted the Average sales amount, or the Count of sales, or some other Aggregate function of sales data, then you would be doing a "Group By"...
And I don't think you need to worry about sorting by the number of months, you can simply sort by the difference between each sales date and the rearliest sale date for each affiliate. (If you wanted to apply a third sorting rule, after the sales date sort, then you would need to be more careful.)
Select * From Sales S
Order By Affiliate,
SalesDate - (Select Min(SalesDate)
From Sales
Where Affiliate = S.Affiliate)
Or, if you really need it to be by the difference in months
Select * From Sales S
Order By Affiliate,
Month(SalesDate) -
(Select Month(Min(SalesDate))
From Sales
Where Affiliate = S.Affiliate)

This is possible in standard SQL if you use what I like to call "SQL gymnastics". It can be done with subqueries.
But it looks incredibly ugly, is hard to maintain and it's really not worth it. You're far better off using one of the many programming languages that wrap SQL (such as PL/SQL) or even a general purpose language that can call SQL (such as Python).
The result will be in two languages but will be all the more understandable than the same thing written in just SQL.

Related

Query to find average stock ... with a twist

We are trying to calculate average stock from a movements table in a single sql sentence.
As far as we are, no problem with what we thought was a standard approach, instead of adding up the daily stock and divide by the number of days, as we don’t have daily stock, we simply add (movements*remaining days) :
select sum(quantity*(END_DATE-move_date))/(END_DATE-START_DATE)
from move_table
where move_date<=END_DATE
This is a simplified example, in real life we already take care of the initial stock at the starting date. Let’s say there are no movements prior to start_date.
Quantity sign depends on move type (sale, purchase, inventory, etc).
Of course this is done grouping by product, warehouse, ... but you get the idea.
It works as expected and the calculus is fine.
But (there is always a “but”), our customer doesn’t like accounting days when there is no stock (all stock sold out). So, he doesnt like
Sum of (daily_stock) / number_of_days (which is what we calculate using a diferent math)
Instead, he would like
Sum of (daily stock) / number_of_days_in_which_stock_is_not_zero
For sure we can do this in any programming language without much effort, but I was wondering how to do it using plain sql ... and wasn’t able to come up with a solution.
Any suggestion?
Consider creating a new table called something like Stock_EndOfDay_History that has the following columns.
stock#
date
stock_count_eod
This table would get a new row for each stock item at the start of a new day for the prior day. Rows could then be purged from this table once the applicable date value went outside the date window of interest.
To get the "number_of_days_in_which_stock_is_not_zero", use this.
SELECT COUNT(*) AS 'Not_Zero_Stock_Days' FROM Stock_EndOfDay_History
WHERE stock# = <stock#_value>
AND <date_window_clause>
Other approaches might attempt to just add a new column to the existing stock table to maintain a cumulative sum of the " number_of_days_in_which_stock_is_not_zero". But inevitably, questions will be asked as to how did the non-zero stock days count get calculated? Using this new table approach will address those questions better than the new column approach.

Ms ACCESS and queries: dates in graph not in order

I use queries in Ms ACCESS to create graphs (shown in forms) to represent monthly spend data on a supplier. I want the x axis to be the months in chronological order, and this is where I'm having issues.
The picture above shows that the x axis starts with april 2016, although the earliest date is august 2015.
The query code that creates the graph is the following:
SELECT (Format([DateStamp],"mmm"" '""yy")) AS Expr1, Sum([Item Master].SpendPerMaterial) AS Expr2
FROM [Item Master]
WHERE ((([Item Master].SupplierName)=[Forms]![Supplier History]![List0]))
GROUP BY (Format([DateStamp],"mmm"" '""yy")), (Year([DateStamp])*12+Month([DateStamp])-1);
[Item Master] is the table were all data is retrieved from. DateStamp refers to the column with months, SpendPerMaterial is the spend of a certain material in that month (which is aggregated since we look at the supplier level, not the material level), and List0 is a list where users can select a supplier from a list of suppliers.
You should never rely on the ordering of results from a query unless you include an explicit order by. In your case, the results are ordered by the columns alphabetically (because of the group by).
You can fix this by adding:
order by max([DateStamp])
to the query.
I would add the following to your query, after your GROUP BY clause:
ORDER BY [datestamp] ASC;
I tried the other suggesions on an aggregate totals by month report and no luck. the only way i could get the actual month labels was by putting labels directly beneath the chart, which means altering it every month!

SQL DateDiff Syntax

I have a homework problem that I'm having a lot of trouble with... I don't expect the answer and I truly want to learn it. Could somebody help me out with the syntax?
Problem:
For each Sales Order, show how many days it took to ship the order in order by the longest order, then by Sales Order Number. Display Sales Order Number and the number of days to ship. Include the orders that have not yet shipped.
So far I have:
SELECT SalesOrder.SalesOrderNumber,
DATEDIFF (d, MIN(SalesOrder.OrderDate), MAX(Shipment.ShipmentDate)) AS "DaysToShip"
FROM SalesOrder, Shipment
GROUP BY SalesOrder.SalesOrderNumber;
Sometimes it's helpful to see an intermediate form of your query to evaluate if it's providing the correct data at some stage.
Consider the following query, pulled from your example minus some elements:
SELECT SalesOrder.SalesOrderNumber, SalesOrder.OrderDate, Shipment.ShipmentDate
FROM SalesOrder, Shipment
You should observe the results of this query and see how they differ from what you expect. In this case, you haven't indicated how SalesOrder and Shipment are related. The result will be many more rows than there are orders, with each SalesOrder related to each and every other Shipment record (a cross-join).
Once you provide the correct join condition and achieve the desired results at that stage, try adding in aggregation (GROUP BY, MIN, MAX) and test that form of your query. Finally, when you're convinced that you have the correct inputs, add in DATEDIFF and you'll have your final query.
SELECT SalesOrder.SalesOrderNumber,
DATEDIFF (d, MAX(SalesOrder.OrderDate), MAX(Shipment.ShipmentDate)) AS "DaysToShip"
FROM SalesOrder, Shipment
GROUP BY SalesOrder.SalesOrderNumber;

Structuring Databases for Financial Statements

I am looking for the best way to structure my database. I have quarterly financial statements for 1000’s of companies from 1997-2012. Each company has three different statements, an income statement, a balance sheet, and a cash flow statement.
I want to be able to perform calculations on the data such as adding up each quarter to get a yearly total for each line item on each statement.
I have tried two ways so far:
1) Storing each line item for each statement in it’s own table i.e. Sales would be one table and have only sales data for all companies I am tracking, with company as the primary key, and each quarters data as a separate column. This seems like the easiest way to work with the data, but updating the data each quarter is time consuming as there are hundreds of tables.
Sales Table
Company q32012 q22012 q12012
ABC Co. 500 100 202
XYZ Co. 230 302 202
2) The other option which is a little easier to update but harder to work with the data is to have a separate table for each company for each statement. For example, the income statement for Royal Bank would have it’s own table, with the primary column being the line item.
Income Statement for Royal Bank
Line_Item q32012 q22012 q12012
Sales
Net Profit
The problem here is when I try to annualize this data, I get a really ugly output due to the group by
SELECT
(CASE WHEN Line_Item = 'Sales' THEN SUM(q4 + q3 + q2 + q1) ELSE '0' END) AS Sales2012,
(CASE WHEN Line_Item = 'NetProfit' THEN SUM(q4 + q3 + q2 + q1)
ELSE '0' END) AS Inventories2012
FROM dbo.[RoyalBankIncomeStatement]
GROUP BY Line_Item
Any help would be appreciated.
Whenever I've had to build a database for fiscal reports by fiscal quarter, month, or year or whatever, I've found it convenient to borrow a concept from star schema design and data warehousing, even if I'm not really building a DW.
The borrowed concept is to have a table, let's call it ALMANAC, that has one row for each date, keyed by the date. In this case a natural key works out well. Dependent attributes in here can be what fiscal month and quarter the date belongs to, whether the date was one where the enterprise was open for business (TRUE or FALSE), and whatever other quirks are in the company calendar.
Then, you need a computer program that just generates this table out of nothing. All the strange rules for the company calendar are embedded in this one program. The ALMANAC can cover a ten year period in a little over 3,650 rows. That's a tiny table.
Now every date in the operational data can be used like a foreign key into the ALMANAC table, provided you consistently use the Date datatype for dates. For example, each sale has a date of sale. Then aggregating by fiscal quarter, or fiscal year, or whatever you like is just a matter of joining operational data with the ALMANAC, and using GROUP BY and SUM() to get the aggregate you want.
It's simple, and it makes generating a whole raft of time period reports a breeze.
My advice to you is to think about not using a SQL database to do this. Instead, think of using something like SQL Server Analysis Services (SSAS). If you want to get a quick start with SSAS, I recommend getting up to speed on PowerPivot for Excel. You can take the model you develop in PowerPivot and import it into SSAS when you're ready.
Why don't I recommend SQL? Because you're going to have a problem aggregating accounts in SQL Server. For example, your balance sheets aren't going to be something you're going to be able to aggregate easily in SQL -- Asking SQL Server for the "Cash" for 2010, for example means that you want to get the entry for the end of December 2010, not that you want to SUM all of the entries for Cash for that year (which would be a nonsense number). On the other hand, with income and expense accounts such as those which would appear on your income statements, you would want to SUM those values up. To make matters worse, some reports are going to have a mix of account types on them, which is going to make reporting quite difficult.
SSAS has provisions inside the product where it "knows" how to aggregate for your reports based on account type, and there are many tutorials out there which can show you how to set this up.
Either way, you're going to need to store your data somewhere before it goes into your reporting system or Analysis Services cube. In order to do that, you should structure your data something like this. Let's say you're storing your data in a table called Reports:
Reports
--------
[Effective Date]
[CompanyID]
[AccountID]
[Amount]
Your Account table would have the description of what you're trying to store (income, expenses, etc). Your [Effective Date] column would link back into a Dates table which describes to which year, quarter, etc., your data belongs. In essence, what I'm describing is a classic shape for reporting databases, called a star schema.
I would probably go with the following structure in one data table:
Company
StatementType
LineItem
FiscalYear
Q1, Q2, Q3, Q4
StatementType would be Income Statement, Balance Sheet or Cash Flow Statement. Line Item would be the coded/uncoded text of the item on the statement, Fiscal Year is 2012, 2011 and so on. You'd still need to make sure that Line Items are consistent across companies.
This structure would let you query for flat statement -
select
LineItem, Q1, Q2, Q3, Q4
from Data
where
Company = 'RoyalBank'
and FiscalYear = 2012
and StatementType = 'Income Statement'
or
QoQ
select
FiscalYear,
Q1
from Data
where
Company = 'Royal Bank'
and
StatementType = 'Income Statement'
and
LineItem = 'Sales'
order by FiscalYear
in addition to aggregates. You'd probably want to have another table for line items with some kind of an index reference to make sure you can pull the statement back in the original order of line items.

Is there a way to handle immutability that's robust and scalable?

Since bigquery is append-only, I was thinking about stamping each record I upload to it with an 'effective date' similar to how peoplesoft works, if anybody is familiar with that pattern.
Then, I could issue a select statement and join on the max effective date
select UTC_USEC_TO_MONTH(timestamp) as month, sum(amt)/100 as sales
from foo.orders as all
join (select id, max(effdt) as max_effdt from foo.orders group by id) as latest
on all.effdt = latest.max_effdt and all.id = latest.id
group by month
order by month;
Unfortunately, I believe this won't scale because of the big query 'small joins' restriction, so I wanted to see if anyone else had thought around this use case.
Yes, adding a timestamp for each record (or in some cases, a flag that captures the state of a particular record) is the right approach. The small side of a BigQuery "Small Join" can actually return at least 8MB (this value is compressed on our end, so is usually 2 to 10 times larger), so for "lookup" table type subqueries, this can actually provide a lot of records.
In your case, it's not clear to me what the exact query you are trying to run is.. it looks like you are trying to return the most recent sales times of every individual item - and then JOIN this information with the SUM of sales amt per month of each item? Can you provide more info about the query?
It might be possible to do this all in one query. For example, in our wikipedia dataset, an example might look something like...
SELECT contributor_username, UTC_USEC_TO_MONTH(timestamp * 1000000) as month,
SUM(num_characters) as total_characters_used FROM
[publicdata:samples.wikipedia] WHERE (contributor_username != '' or
contributor_username IS NOT NULL) AND timestamp > 1133395200
AND timestamp < 1157068800 GROUP BY contributor_username, month
ORDER BY contributor_username DESC, month DESC;
...to provide wikipedia contributions per user per month (like sales per month per item). This result is actually really large, so you would have to limit by date range.
UPDATE (based on comments below) a similar query that finds "num_characters" for the latest wikipedia revisions by contributors after a particular time...
SELECT current.contributor_username, current.num_characters
FROM
(SELECT contributor_username, num_characters, timestamp as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL)
AS current
JOIN
(SELECT contributor_username, MAX(timestamp) as time FROM [publicdata:samples.wikipedia] WHERE contributor_username != '' AND contributor_username IS NOT NULL AND timestamp > 1265073722 GROUP BY contributor_username) AS latest
ON
current.contributor_username = latest.contributor_username
AND
current.time = latest.time;
If your query requires you to use first build a large aggregate (for example, you need to run essentially an accurate COUNT DISTINCT) another option is to break this query up into two queries. The first query could provide the max effective date by month along with a count and save this result as a new table. Then, could run a sum query on the resulting table.
You could also store monthly sales records in separate tables, and only query the particular table for the months you are interested in, simplifying your monthly sales summaries (this could also be a more economical use of BigQuery). When you need to find aggregates across all tables, you could run your queries with multiple tables listed after the FROM clause.