I have a subset summation problem I cannot find the answer to. I am trying to write something in VBA for access that will take all combinations of summations within a certain criteria and place them in a table so I can match a different table to it. Right now I am more concerned with creating the table of combinations. First time I have asked a question sorry if I mess something up.
Example:
Access Table: ImpTable
Fields: ID, Year-Month, Name, Country, Quantity
I need to make every combination of summations where the country and Year-Month are the same. Yet keep track of what was included in the formula. If the new table was created and kept track of which ID's were included in the combination I can reference the original table for the name.
Expected Ending Table Results:
NewID, Year-Month, Country, SumQuantity, ComboName (ID's from original table)
Any help is appreciated.
Raw Data:
ID Year-Month Name Country Quantity
1 2016-06 Person1 US 10
2 2016-06 Person2 US 12
3 2016-10 Person3 US 4
4 2016-06 Person4 UK 5
5 2016-06 Person5 UK 6
6 2016-06 Person6 US 3
Desired Results:
NewID Year-Month Country SumQuantity ComboName
1 2016-06 US 22 1,2
2 2016-06 US 13 1,6
3 2016-06 US 25 1,2,6
4 2016-06 US 15 2,6
5 2016-06 UK 11 4,5
6 2016-10 US 4 3
Related
I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help
Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results
I'm trying to achieve a result where only one result for each TEAM and each PLACE is returned.
The twist is that the highest result should from each place should have priority.
My table currently looks something like this:
ENTRY_ID TEAM_ID DATE PLACE SCORE
1 1 2021-10-12 Ireland 64
2 2 2021-10-12 Ireland 31
3 3 2021-10-12 France 137
4 2 2021-10-12 France 61
5 5 2021-10-12 France 38
6 1 2021-10-12 France 66
7 2 2021-10-12 Italy 17
8 3 2021-10-12 Italy 61
9 1 2021-10-12 Italy 74
The competition is held at three different places at the same time, with technically all teams being able to have people playing in all of them at the same time.
Each team however can only win one point so, in the example, it's possible to see that Team 1 would win both in Italy and Ireland, but it should be awarded only one point for the highest score, so only Italy. The point in Ireland should go to the second place.
I've tried over 30 queries I've found in several correlated questions, but none of them seems to be applicable to my situation.
Basically:
"Return the highest score on each PLACE, but only calls each TEAM once.
If that certain TEAM was already called, ignore it, get the second place."
So I could retrieve all three winners with no further processing. The results I'm trying to achieve should repeat neither the TEAM_ID nor PLACE, in this particular example it should output:
3 FRANCE (Since it has the highest score in France at 137)
1 ITALY (For the highest score in Italy at 74)
2 IRELAND (For the second-highest score in Ireland, since Team 1 already won in Italy)
The production model of this table has far more entries so it's unlikely there would be any clashes with too many second-places.
How can I achieve that?
I am working on a problem where I have the following table:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 1200
2 Italy 1200
3 USA 1200
3 UK 1200
3 Italy 1200
There are 3 companies in this table, but company '2' and company '3' have offices in 2 and 3 countries respectively. All companies pay 1200 per month, and because company 2 has 2 offices it shows as if they paid 1200 per month 2 times, and because company 3 has 3 offices it shows as if it paid 1200 per month 3 times. Instead, I would like revenue to be equally distributed based on how many times company_id appears in the table. company_id will only appear more than once for every additional country in which a company is based.
Assuming each company always pays 1,200 per month, my desired output is:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 600
2 Italy 600
3 USA 400
3 UK 400
3 Italy 400
Being new to SQL, I was thinking this can maybe be done through CASE WHEN statement, but I only learned to use CASE WHEN when I want to output a string depending on a condition. Here, I am trying to assign equal revenue weight to each company's country, depending on in how many countries a company is based in.
Thank you in advance for you help!
Below is for BigQuery Standard SQL
#standardSQL
SELECT company_id, country,
total_revenue / (COUNT(1) OVER(PARTITION BY company_id)) AS total_revenue
FROM `project.dataset.table`
If to apply to sample data from your question - output is
Row company_id country total_revenue
1 1 Russia 1200.0
2 2 Croatia 600.0
3 2 Italy 600.0
4 3 USA 400.0
5 3 UK 400.0
6 3 Italy 400.0
I’m looking for a little assistance. I have a table called equipment. One row is an order of some type of equipment.
Here are the fields:
num_id date player_id order_id active jersey comment
BIGINT DATE BIGINT BIGINT CHAR(1) CHAR(3) VARCHAR(1024)
11 2018-01-01 123 1 Y XL
11 2018-01-01 123 2 Y M Purple
11 2018-01-01 123 3 Y L White, Red
13 2018-01-11 456 1 N S Yellow, Light Blue
14 2018-02-01 789 1 Y M Orange, Black
15 2018-02-02 101 1 Y XL Shield
15 2018-02-02 101 2 Y XL Light Green, Grey
I need to write a query that shows one row for each month with the columns
Month
Total Orders
Total Products ordered
And one extra column for a total count of each size sold.
Is this easy? Any help would be appreciated.
EDIT: To answer people's questions below, SQL Server is the dbms. My apologies. As well, I am struggling as I don't know how to get the month from a date. And then adding the column for size counts has me baffled, but I haven't fully investigated that portion. I feel like the rest I have done individually, just never did it in one succinct query.
It looks weird here and I don't know how to add a table to stackoverflow, so I'll try to make it a little more visually appealing here:
The end goal I think would be like this:
Month Total Orders Total Products Ordered Size Count
January 1 3 S-0, M-1, L-1, XL-2
February 3 6 S–1, M–2, L–1, XL–3
Or this:
Month Total Orders Total Products Ordered S Count M Count L Count XL Count
January 1 3 0 1 1 2
February 3 6 1 2 1 3
You need PIVOT.
It basicly turns rows into columns, which exactly is your case.
https://www.codeproject.com/Tips/500811/Simple-Way-To-Use-Pivot-In-SQL-Query
Perhaps my title is misleading, but I am not sure how else to phrase this. I have two tables, tblL and tblDumpER. They are joined based on the field SubjectNumber. This is a one (tblL) to many (tblDumpER) relationship.
I need to write a query that will give me, for all my subjects, a value from tblDumpER associated with a date in tblL. This is to say:
SELECT tblL.SubjectNumber, tblDumpER.ER_Q1
FROM tblL
LEFT JOIN tblDumpER ON tblL.SubjectNumber=tblDumpER.SubjectNumber
WHERE tblL.RandDate=tblDumpER.ER_DATE And tblDumpER.ER_Q1 Is Not Null
This is straightforward enough. My problem is the value RandDate from tblL is different for every subject. However, it needs to be displayed as Day1 so I can have tblDumpER.ER_Q1 as Day1 for every subject. Then I need RandDate+1 As Day2, etc until I hit either null or Day84. The 'dumb' solution is to write 84 queries. This is obviously not practical. Any advice would be greatly appreciated!
I appreciate the responses so far but I don't think that I'm explaining this correctly so here is some example data:
SubjectNumber RandDate
1001 1/1/2013
1002 1/8/2013
1003 1/15/2013
SubjectNumber ER_DATE ER_Q1
1001 1/1/2013 5
1001 1/2/2013 6
1001 1/3/2013 2
1002 1/8/2013 1
1002 1/9/2013 10
1002 1/10/2013 8
1003 1/15/2013 7
1003 1/16/2013 4
1003 1/17/2013 3
Desired outcome:
(Where Day1=RandDate, Day2=RandDate+1, Day3=RandDate+2)
SubjectNumber Day1_ER_Q1 Day2_ER_Q1 Day3_ER_Q1
1001 5 6 2
1002 1 10 8
1003 7 4 3
This data is then going to be plotted on a graph with Day# on the X-axis and ER_Q1 on the Y-axis
I would do this in two steps:
Create a query that gets the MIN date for each SubjectNumber
Join this query to your existing query, so you can perform a DATEDIFF calculation on the MIN date and the date of the current record.
I'm not entirely sure of what it is that you need, but perhaps a calendar table would be of help. Just create a local table that contains all of the days of the year in it, then use that table to JOIN your dates up?