Please forgive my woefully limited understanding of SQL, but I'm hoping someone can help me. I need to alter a query written by someone else some time ago.
The query displays consumption per industry for a variety of industries in a number of areas. The table it spits out currently looks something like this:
+---------------+----------+---------+
| Economic area | Industry | Total |
+---------------+----------+---------+
| Area1 | | |
| | Ind1 | 459740 |
| | Ind2 | 43000 |
| | Ind3 | 0 |
| | Total | 502740 |
| Area2 | | |
| | Ind1 | 725560 |
| | Ind2 | 111017 |
| | Ind3 | 277577 |
| | Total | 1114154 |
+---------------+----------+---------+
Unfortunately, this table in conjunction with another table we publish on the number of producers in each industry and area can reveal commercially sensitive information when there are very few producers. For instance, in the table below, there's only one producer in Industry 2 in Area 1, so everything in the above table consumed by industry 2 in Area 1 goes to that producer.
+---------------+---------+------+------+------+
| Economic area | County | Ind1 | Ind2 | Ind3 |
+---------------+---------+------+------+------+
| Area1 | | | | |
| | county1 | 1 | 0 | 0 |
| | county2 | 3 | 1 | 2 |
| | county3 | 1 | 0 | 0 |
| | Total: | 5 | 1 | 2 |
| | | | | |
| Area2 | county4 | 5 | 0 | 1 |
| | county5 | 3 | 3 | 1 |
| | county6 | 1 | 0 | 1 |
| | county7 | 0 | 0 | 0 |
| | Total: | 9 | 3 | 3 |
+---------------+---------+------+------+------+
What I've been asked to do is to produce a condensed version of the first table that looks like the one below, where industries that have less than 3 producers in an area are aggregated into a generic Other Industry. Something like this:
+---------------+----------+--------+
| Economic area | Industry | All |
+---------------+----------+--------+
| Area1 | | |
| | Ind1 | 459740 |
| | OtherInd | 121376 |
| | Total | 581116 |
| Area2 | | |
| | Ind1 | 725560 |
| | Ind2 | 111017 |
| | Ind3 | 244 |
| | Total | 836821 |
+---------------+----------+--------+
I have been searching for a while, but haven't been able to find anything that works, or that I can understand well enough to make it work. I tried using a Count(Case(industry_code<3,1,0))... but I'm working in MS Access, so that doesn't work. I thought about using and IIF or a Switch statement, but it doesn't seem like either of those allow for the right type of comparison. I also found where someone suggested a From statement that had two different groupings - but Access spat out an error when I tried it.
The only marginal success I've had is with a HAVING (((Count(Allmills.industry_code))>3)), but it just drops the problem industries completely.
Currently the a somewhat simplified version of the query looks like this:
SELECT
economic_areas.economic_area AS [Economic area],
Industry_codes.industry_heading AS Industry,
Sum(Allmills.consumption) AS [All],
Sum(Allmills.[WA origin logs]) AS Washington
Allmills.industry_code,
Count(Allmills.industry_code) AS CountOfindustry_code,
Sum(Allmills.industry_code) AS SumOfindustry_code
FROM ((economic_areas INNER JOIN Allmills ON (economic_areas.state_abbrev =
Allmills.state_abbrev)
AND (economic_areas.economic_area_code = Allmills.economic_area_code))
INNER JOIN Industry_codes ON Allmills.display_industry_code =
Industry_codes.industry_code)
WHERE (((Allmills.economic_area_code) Is Not Null))
GROUP BY Allmills.display_industry_economic_area_code,
Allmills.display_industry_code, economic_areas.economic_area,
Industry_codes.industry_heading, Allmills.industry_code
ORDER BY Allmills.display_industry_economic_area_code,
Allmills.display_industry_code;
Any help would be greatly appreciated, even just suggestions of what types of techniques might be useful that I can look into elsewhere - I'm just running in circles right now.
HAVING is really solution here - change your query to use HAVING with > 3, add another query with HAVING <= 3, then UNION ALL the results of them
Related
I'm designing a database for a workout tracker app. Each user should be able to track multiple workouts (routines). A workout can have multiple exercises an exercise can be used in many workouts. Each exercise will have a specific track type (weight and reps, distance and time, only reps).
My tables so far:
| User | |
|------|-------|
| id | name |
| 1 | Ilka |
| 2 | James |
| Exercise | | |
|----------|---------------------|---------------|
| id | name | track_type_id |
| 1 | Barbell Bench Press | 1 |
| 2 | Squats | 1 |
| 3 | Deadlifts | 1 |
| 4 | Rowing Machine | 3 |
| Workout | | |
|---------|---------|-----------------|
| id | user_id | name |
| 1 | 1 | Chest & Triceps |
| 2 | 1 | Legs |
| Workout_Exerice (Junction table) | |
|-----------------|------------------|------------|
| id | exersice_id | workout_id |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 4 | 1 |
| Workout_Sets | | | |
|--------------|---------------------|------|--------|
| id | workout_exersice_id | reps | weight |
| 1 | 1 | 12 | 120 |
| 2 | 1 | 10 | 120 |
| 3 | 1 | 8 | 120 |
| 4 | 2 | 10 | 220 |
| 5 | 3 | null | null |
| TrackType | |
|-----------|-----------------|
| id | name |
| 1 | Weight and Reps |
| 2 | Reps Only |
| 3 | Distance Time |
My issue is how to incorporate the TrackType table for each workout set, my first option was to create columns in the Workout_Sets table for each tracking type (weight and reps, distance and time, only reps) but that means for many rows I will have many nulls. Another option I thought was to use an EAV type table but I'm not sure. Also do you think my design is efficient (Over-normalization)?
I would say that the most efficient way is to have nulls in your table. The alternative would require you to split many of the category's into separate tables. Also a recommendation is that you start factoring a User ID table into your database
Your description states that “Each exercise will have a specific track type” suggesting a one-to-one relationship between Exercise and TrackType, and that the relationship is unchanging. As such, the exercise table should have a TrackType column.
I suspect, however, that your problem description may be lacking specificity, making it difficult to give you sound advice. For instance, if the TrackType can vary for any given exercise, your TrackType column may belong on the Workout_Sets table. If the relationship between TrackType and Exercise/Workout_Sets is many-to-many, then you will need another junction table.
Your question regarding “over-normalization” depends upon many factors that are specific to your solution. In general, I would say no - the degree of normalization appears to be appropriate.
I have created a database which stores some simulations results in this way:
+--------------+--------+--------+-------------+--------------+
| SimulationID | TypeID | UserID | InputListID | OutputListID |
+--------------+--------+--------+-------------+--------------+
| 1 | 1 | 1 | 1 | 11 |
+--------------+--------+--------+-------------+--------------+
| 2 | 1 | 1 | 2 | 12 |
+--------------+--------+--------+-------------+--------------+
| 3 | 1 | 1 | 3 | 13 |
+--------------+--------+--------+-------------+--------------+
Where InputListID refers to other tables where group of variables are stored. Each simulation could have different number of parameters. If I simply use SELECT connecting the simulations with the variables I get the following table:
+--------------+--------------+---------------+---------------+
| SimulationID | VariableName | VariableValue | VariableUnits |
+--------------+--------------+---------------+---------------+
| 1 | Young Mod | 100 | GPa |
+--------------+--------------+---------------+---------------+
| 1 | Poisson | 0.3 | NULL |
+--------------+--------------+---------------+---------------+
| 2 | Young Mod | 101 | GPa |
+--------------+--------------+---------------+---------------+
| 2 | Poisson | 0.25 | NULL |
+--------------+--------------+---------------+---------------+
Is it possible to recursively concatenate columns to have something like this:
+--------------+-----------+---------+
| SimulationID | Young Mod | Poisson |
+--------------+-----------+---------+
| 1 | 100 | 0.3 |
+--------------+-----------+---------+
| 2 | 101 | 0.25 |
+--------------+-----------+---------+
Where the number of columns could change depends on the number of parameters and if units for example are wanted as well?
I am a beginner in SQL, so I apologize in advance if the question seems silly!
I changed a but the context, but it's basically the same issue.
Imagine we are in a never-ending tunnel, shaped like a circle. We split every section of the circle, from 1 to 10 and we'll call each section slot (sl). There are 2 groups (gr) of living things walking in the tunnel. Each group has 2 bands, where each has a name and global hitpoints (hp). Every group is walking forward (although the bands might change order). If a group is at slot #10 and moves forward, he will be at slot #1. We snapshot their information every day. All the data gathered is stored in a table with this structure:
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| day_id | | gr_1_sl_1_id | | gr_1_sl_1_name | | gr_1_sl_1_hp | | gr_1_sl_2_id | | gr_1_sl_2_name | | gr_1_sl_2_hp | | gr_2_sl_1_id | | gr_2_sl_1_name | | gr_2_sl_1_hp | | gr_2_sl_2_id | | gr_2_sl_2_name | | gr_2_sl_2_hp | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| 1 | 3 | orc | 100 | 4 | goblin | 10 | 10 | human | 50 | 1 | dwarf | 25 | |
| 2 | 6 | goblin | 7 | 7 | orc | 76 | 2 | human | 60 | 3 | dwarf | 28 | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
As you can see, the columns are structured in a sequential way, while the data shows what is the actual value. What I want is to have the information shaped this way instead:
+---------+-------+-------+-----------+---------+
| id_game | gr_id | sl_id | band_name | band_hp |
+---------+-------+-------+-----------+---------+
| 1 | 1 | 3 | orc | 100 |
| 1 | 1 | 4 | goblin | 10 |
| 1 | 2 | 10 | human | 50 |
| 1 | 2 | 1 | dwarf | 25 |
| 2 | 1 | 6 | goblin | 7 |
| 2 | 1 | 7 | orc | 76 |
| 2 | 2 | 2 | human | 60 |
| 2 | 2 | 3 | dwarf | 28 |
+---------+-------+-------+-----------+---------+
I have this information in power bi, although I can create views in sql server if need be. I have tried many things, closest thing I got was unpivoting and parsing the original columns to get day_id, gr_id, sl_id, attributes and values. In attributes and values, it's basically name and hp with their corresponding value (I changed hp into string), but then I'm stocked, I'm not sure what to do next.
Anyone has any ideas ? Keep in mind that I oversimplified the problem; there are more groups, more slots, more bands and more statistics (i.e. attack and defense rating, etc.)
You seem to want to unpivot the table. In SQL Server, I recommend using apply:
select t.day_id, v.*
form t cross apply
(values (1, 1, gr_1_sl_1_id, gr_1_sl_1_name, gr_1_sl_1_hp),
(1, 2, gr_1_sl_2_id, gr_1_sl_2_name, gr_1_sl_2_hp),
(2, 1, gr_2_sl_1_id, gr_1_sl_1_name, gr_2_sl_1_hp),
(2, 2, gr_2_sl_2_id, gr_1_sl_2_name, gr_2_sl_2_hp)
) v(id_game, gr_id, sl_id, band_name, band_hp);
In other databases, you can do something similar with union all.
I am trying to create a manually calculated column where I keep track of a current inventory.
Currently, I have a table that looks like this:
| Group | Part | Operation Type | Transaction Amount |
|--------------|------|----------------|--------------------|
| Concrete | A | STOCK | 100 |
| Concrete | A | Buy | 25 |
| Concrete | A | Make | -10 |
| Concrete | A | Make | -10 |
| Concrete | A | Make | -10 |
| Concrete | A | Make | -10 |
| Concrete | A | Make | -10 |
| Concrete | B | STOCK | -10 |
| Concrete | B | Make | -10 |
| Concrete | B | Make | -10 |
| Concrete | B | Make | -10 |
| Concrete | B | Make | -10 |
| Concrete | B | Make | 150 |
| Construction | C | STOCK | 10 |
| Construction | C | Make | -1 |
| Construction | C | Make | -1 |
| Construction | C | Make | -1 |
| Construction | C | Make | -1 |
| Construction | D | STOCK | 5 |
| Construction | D | Make | -5 |
The table is first ordered by group then by part, and then STOCK is always shown as the first value. The idea is to create a new manually calculated column, curr_inventory, that allows for us to keep track of current inventory and see if or when our inventory for a given part, for a given group, dips below 0.
Ideally, the end results would look like this:
| Group | Part | Operation Type | Transaction Amount | New_Inventory_Column |
|:------------:|:----:|:--------------:|:------------------:|:--------------------:|
| Concrete | A | STOCK | 100 | 100 |
| Concrete | A | Buy | 25 | 125 |
| Concrete | A | Make | -10 | 115 |
| Concrete | A | Make | -10 | 105 |
| Concrete | A | Make | -10 | 95 |
| Concrete | A | Make | -10 | 85 |
| Concrete | A | Make | -10 | 75 |
| Concrete | B | STOCK | 10 | 10 |
| Concrete | B | Make | -10 | 0 |
| Concrete | B | Make | -10 | -10 |
| Concrete | B | Make | -10 | -20 |
| Concrete | B | Make | -10 | -30 |
| Concrete | B | Make | 150 | 120 |
| Construction | C | STOCK | 10 | 10 |
| Construction | C | Make | -1 | 9 |
| Construction | C | Make | -1 | 8 |
| Construction | C | Make | -1 | 7 |
| Construction | C | Make | -1 | 6 |
| Construction | D | STOCK | 5 | 5 |
| Construction | D | Make | -5 | 0 |
The end result would be a column that initiates when the part number has changed and the operation type is STOCK, and then begins to calculate (using the transaction amount) what the current inventory is.
I am not sure where to start on a SQL query that would allow for this. Intuitively, the pseudocode would look something like:
for each row in table:
if operation_type == "stock":
curr_inv = stock.value
else:
curr_inv = previous_curr_inv + transaction_amount
However, I am not sure how to even begin writing SQL for this. I typically try to post what SQL I am working with but I don't even know where to begin. I have looked at various posts online, on SO, including posts like this, and this, and this, but I could not see how the selected answers could be used as a solution.
I used the window function to calculate the running total.
I added the row_number column in the subquery.
Try this:
select t1."Group",t1."Part",t1."Operation Type", t1."Transaction Amount",
sum(t1."Transaction Amount") over (partition by t1."Group",t1."Part" order by t1.rownumber)
from (
select row_number() over (order by null) as rownumber, t.*
from test t ) t1
Test Result:
DB<>Fiddle
SQL tables represent unordered sets. To put things in order, you need a column with the ordering. In the code below, I will use ? for this column.
Then you want a cumulative sum:
select t.*,
sum(case when operation_type in ('STOCK'),
when operation_type in ('Make') then - amount
when operation_type in ('Buy') then - amount
else 0
end) over (partition by group, part order by ?)
from t;
I have an ACTIVE_TRANSPORTATION table:
+--------+----------+--------+
| ATN_ID | TYPE | LENGTH |
+--------+----------+--------+
| 1 | SIDEWALK | 20.6 |
| 2 | SIDEWALK | 30.1 |
| 3 | TRAIL | 15.9 |
| 4 | TRAIL | 40.4 |
| 5 | SIDEWALK | 35.2 |
| 6 | TRAIL | 50.5 |
+--------+----------+--------+
It is related to an INSPECTION table via the ATN_ID:
+---------+--------+------------------+
| INSP_ID | ATN_ID | LENGTH_INSPECTED |
+---------+--------+------------------+
| 101 | 2 | 15.2 |
| 102 | 3 | 5.4 |
| 103 | 5 | 15.9 |
| 104 | 6 | 20.1 |
+---------+--------+------------------+
I want to summarize the information like this:
+----------+--------+-------------------+
| TYPE | LENGTH | PERCENT_INSPECTED |
+----------+--------+-------------------+
| SIDEWALK | 85.9 | 36% |
| TRAIL | 106.8 | 23% |
+----------+--------+-------------------+
How can I do this within a single query?
Here is the updated answer using ACCESS 2010. Note that LENGTH is reserved in ACCESS, so it needs to be changed to LENGTH_
SELECT
TYPE,
SUM(LENGTH) as LENGTH_,
SUM(IIF(ISNULL(LENGTH_INSPECTED),0, LENGTH_INSPECTED))/SUM(LENGTH) as PERCENT_INSPECTED
FROM
ACTIVE_TRANSPORTATION A
LEFT JOIN INSPECTION B
ON A.ATN_ID = B.ATN_ID
GROUP BY TYPE
Here is the answer using T-SQL in SQL SERVER 2014 I had originally
SELECT SUM(LENGTH) as LENGTH,
SUM(ISNULL(LENGTH_INSPECTED,0))/SUM(LENGTH) as PERCENT_INSPECTED,
TYPE
FROM
ACTIVE_TRANSPORTATION A
LEFT JOIN INSPECTION B
ON A.ATN_ID = B.ATN_ID
GROUP BY TYPE
Let me know if you need it to be converted to percent, rounded, etc, but I'm guessing that part is easy for you.