How can I query a table for transitive matches? - sql

Giving the following tables
Units:
| id | singular | plural |
|----|----------|--------|
| 3 | onion | onions |
| 4 | bag | bags |
| 5 | gram | grams |
| 6 | ml | ml |
| 7 | mm | mm |
and
Conversions:
| id | convert_from | convert_to | factor |
|----|--------------|------------|--------|
| 3 | 4 | 3 | 5 |
| 4 | 3 | 5 | 125 |
How could I obtain all possible conversion factors from (for example) bag (unit 4)?
I would expect the answers to resemble the form
| convert_from | convert_to | factor |
|--------------|------------|--------|
| 4 | 3 | 5 |
| 4 | 5 | 625 |
Caveats:
There is no guarantee about which column of the conversions table (convert_from, convert_to) a unit might appear in.
Conversions that transit through units 5, 6, or 7 should be ignored.
That is to say,
1->2->4->5 is valid, 1->2->4->5->7 is not.
A SQL solution (or re-architecting of the database to facilitate a SQL solution) would be ideal, but a code solution that makes multiple SQL queries would also be appreciated.
There will be other units in the units table that should be ignored if they do not form part of the conversion graph (or if they form part of the branch through an invalid transition (5, 6, or 7)). This is a simplified view.
Illustrative example
Ignoring SQL and retrieving the data for a moment, here's what I'm trying to achieve:
I want to build a system where users can store household products. A product has a unit associated with it. The unit might be an SI unit, such as mm, ml, g.. or it might be a discrete unit such as onion, or can.
Units can have relationships amongst themselves, so for example 1 can -> 330 ml.
The complexity of my question comes from the fact that the conversions for a single unit might be spread across many products.
Considering the can example again, we can have a product called pepsi (crate of 24) with the unit being crate, and another product called pepsi (can) with the unit of can.
When the user creates the pepsi (can) product, they provide the following conversion:
1 can -> 330 ml
Later, the user creates the pepsi (crate of 24) product, and provides the following conversion:
1 crate -> 24 can
Finally, the user asks the question "how much pepsi do I have?"
I'd like to be able to answer:
25 cans
1.0417 crates
8250 ml.
However, I don't know how to convert crates to ml.
Here's another example in illustrated form:
Edit:
Changed mms and mls to mm and ml. Not sure what I was thinking...
Added diagram to help clarify what i'm looking for rather than the solution.

You can use a recursive CTE, assuming there are no cycles in the data.
I added an extra is_terminal column to identify the terminal units where you don't want to convert from anymore (5, 6, and 7). The query is:
with recursive
e (convert_from, convert_to, factor, is_terminal) as (
select id, id, 1, is_terminal from units where id = 4 -- bag
union all
select e.convert_from, c.convert_to, e.factor * c.factor, u.is_terminal
from e
join conversions c on c.convert_from = e.convert_to
join units u on u.id = c.convert_to
where not e.is_terminal
)
select * from e where convert_from <> convert_to
Result:
convert_from convert_to factor is_terminal
------------ ---------- ------ -----------
4 3 5 false
4 5 625 true
See running example at DB Fiddle . Here's the data script I used to test:
create table units (
id int,
is_terminal boolean
);
insert into units (id, is_terminal) values
(3, false), (4, false),
(5, true), (6, true), (7, true);
create table conversions (
id int,
convert_from int,
convert_to int,
factor int
);
insert into conversions (id, convert_from, convert_to, factor) values
(3, 4, 3, 5),
(4, 3, 5, 125);

Related

SQLite Create Table and Insert & Update Value

When I am studying at Udemy SQL course, I am stuck on the below problem. Could you help me how to solve? It is very difficult for me.
Here's the question:
The goal of this exercise is to find out whether each one of the 4 numeric attributes in TemporalData is
positively or negatively correlated with sales.
Requirements:
Each example in the sample consists of a pair (Store, WeekDate). This means that Sales data must be summed across all departments before the correlation is computed.
Return a relation with schema (AttributeName VARCHAR(20), CorrelationSign Integer).
The values of AttributeName can be hardcoded string literals, but the values
of CorrelationSign must be computed automatically using SQL queries over
the given database instance.
You can use multiple SQL statements to compute the result. It might be of help to create and update your own tables.
Here's the sales table:
| store | dept | weekdate | weeklysales |
|1 | 1 | 2010-07-01| 2400.0 |
|2 | 1 | 2010-07-02| 40.0 |
Here's the temporaldata table:
| store | weekdate | temperature | fuelprice | cpi | unemploymentrate|
| 1 |2010-07-01| 100.14. | 2.981 | 125.1222 | 9.402
| 2 |2010-07-02| 99.13 | 3.823 | 129.2912 | 14.81
So, I wrote code like below. Could you give how to solve this question.
DROP TABLE IF EXISTS CorrelationGroup
CREATE TABLE CorrelationGroup(
attributName VARCHAR(30),
CorrelationSign INTEGER
);
SELECT SUM(weeklysales)
FROM temporaldate t, sales s
WHERE s.weekdate = t.weekdate AND s.store = t.store ;
SELECT AVG(t.temperature) as 'temp', AVG(t.fuelprice) as 'fuel', AVG(t.cpi) as 'fuel', AVG(t.unemploymentrate) as'unemploymentrate
FROM sales s, temporaldate t
WHERE s.store = t.store AND s.weekdate = t.weekdate;
INSERT INTO CorrelationGroup VALUES ('Temp', 'The Value Of CorrelationSign');
-- 3 more

SQL DB2 : How to JOIN dynamicaly 2 tables using "running total calculations"

I am learning SQL on a server: IBM V7R1M0, DB2.
I am trying to build a SQL report.
After seeking a similar example several days, I launch this bottle in the ocean of knowledge...
Context:
The stores request goods from the warehouse.
Those goods are pick on pallets.
Those pallets will be put on staging lane before to load them in a truck.
Rule1: We want only pallet(s) from one store on a staging lane (We don't want to mix the pallets from different stores)
Rule2: A store will occupy staging lanes which are nearby.
Rule3: Staging lanes are ordered by there ID (with gaps)
Table 1:
|-----|-----|-----------------|
| ID |store|pallet_estimation|
|-----|-----|-----------------|
| 1 | A | 35 |
| 2 | C | 2 |
| 3 | B | 30 |
|-----|-----|-----------------|
SELECT * FROM (
VALUES (1, 'A', 35), (2, 'C', 2), (3, 'B', 30)
) T1(ID, store, pallet_estimation)
Table 2 :
|---------------|---------------|
|ID_staging_lane|pallet_capacity|
|---------------|---------------|
| 201 | 10 |
| 202 | 10 |
| 204 | 30 |
| 205 | 40 |
| 208 | 30 |
| 210 | 30 |
|---------------|---------------|
SELECT * FROM(
VALUES (201, 10), (202, 10), (204, 30), (205, 40), (208, 30), (210, 30)
) T2(ID_staging_lane, pallet_capacity)
Expected result:
|-----------|--------|--------------------|---------------|------------------|
|T1_sequence|T1_store|T1_pallet_estimation|T2_staging_lane|T2_pallet_capacity|
|-----------|--------|--------------------|---------------|------------------|
| 1 | A | 35 | 201 | 10 |
| 1 | A | 35 | 202 | 10 |
| 1 | A | 35 | 204 | 30 |
| 2 | C | 2 | 205 | 40 |
| 3 | B | 30 | 208 | 30 |
|-----------|--------|--------------------|---------------|------------------|
Thanks you, Charles, for you time.
I'll try to improve my demand.
If needed, I want to split/divide the pallet_estimation on several staging lanes, following the sequence
Example:
For store A which has 35 pallets,
I want to use staging lane 201 then it remains 35 - 10 = 25 ,
then I want to use staging lane 202 then it remains 25 - 10 = 15,
then I want to use staging lane 204 then it remains 15 - 30 = -15
then I want to continue with the store C on the next staging lane 205 then it remains 2 - 40 = -38
then I want to continue with the store B on the next staging lane 208 then it remains 30 - 30 = 0
How would you start to build that ?
- with window function ? SUM() OVER()
- with recursive SQL ? DECLARE FETCH
- is it possible to build a dynamic JOIN in SQL ?
- other idea ?
Thanks in advance,
Renaud
First of all, v7r1 is very old...10 years to be exact...
Secondly, I don't understand what you're trying to join on...I see nothing that would explain why store A ended up with 3 rows in your results.
Thirdly, there's no such thing as a "dynamic join", in any RDBMS. You can have a dyanmic statement, which could include a join. Or you can have a static statement, which also could include a join. For Db2 on the IBM i, it only matters if your incorporating the statement in an RPG/COLBOL program or an SQL stored procedure/function.
Now having said all that, let me introduce you to Common Table Expressions (CTE). Basicaly the same as a Nested Table Expression (NTE) but IMO easier to follow and CTEs also can have a performance benefit over NTE on the i.
with T1 as (
SELECT * FROM (
VALUES (1, 'A', 35), (2, 'C', 2), (3, 'B', 30)
) T1(ID, store, pallet_estimation)
), T2 as (
SELECT * FROM(
VALUES (201, 10), (202, 10), (204, 30), (205, 40), (208, 30), (210, 30)
) T2(ID_staging_lane, pallet_capacity)
), fitment as (
select T1.*, T2.*, row_number() OVER(partition by ID_STAGING_LANE) as rowNbr
from T1 join T2 on pallet_estimation <= pallet_capacity
)
select * from fitment where rowNbr = 1;
The with T1 as (<select statement>) is the common table expression; as is T2 and fitment. The with keyword is only used for the First CTE.
The fitment CTE joins T1 and T2 based upon which estimate fits in the lane description, assigning a row_number to each possibility. The final select takes the first fit for each lane.
The nice thing about CTE's is you can easily build them and see the results as you go along. At any point you can add select * from MYCTE and see what you have so far.
Note that as shown, a CTE can reference another CTE. (fitment reference both T1 and T2)
EDIT
The functions your need to use, to look forward or backwards in the result set are named LAG() and LEAD(). They are part of the OLAP functionality built into Db2 for i. Unfortunately for you, they were added at 7.3.
You will need to roll your own version using a user defined function (UDF) that makes use what's known as the scatchpad to save data between the calls to the function for each row.
I found an very old article Scribble on SQL's Scratchpad showing how to the scratch pad in RPG. You can also use it inside an SQL defined UDF.
Do a bit a googling to see if you can get started. If you run into issues, create a new question here. (or check out the Midrange mailing lists

mapping of areas with multiple users

I have areas like sector 1, sector 1 a, sector 1 b, sector 1 c and multiple cable operators who are working in either full sector(i.e sector 1) or any of the sub sectors. I have created table of cable operators and want to map them with areas. If I set up area table like sector 1, sector 1 a, sector 1 b, sector 1 c each with their own Primary Key then how can I reference these sectors in single row of cable operators provided that we have to get the cable operators working in that particular sector.
My table structures are as follows:
Operators
| id | name
| 1 | 'abc'
| 2 | 'def'
| 3 | 'ghi'
areas
| id | name
| 1 | 'sector 1'
| 2 | 'sector 1a'
| 3 | 'sector 1b'
| 4 | 'sector 1c'
| 5 | 'sector 1d'
| 6 | 'sector 2'
| 7 | 'sector 2a'
| 8 | 'sector 2b'
| 9 | 'sector 2c'
| 10 | 'sector 2d'
I have operatorsareas table where I have map operators with areas as follows:
operatorsareas
| op_id | area_id
| 1 | 1
| 2 | 1
| 3 | 1
| 1 | 7
| 2 | 8
| 3 | 7
Now I have used this query which gives me no result:
select o.id, o.name from operator as o
where not exists(select * from areas a where id in (1,7,8) and not exists(select * from operatorareas as oa where oa.operatorid=o.id
and oa.areaid = a.id))
I have taken the reference of following link:
SQL query through an intermediate table
I need a guidance regarding structuring of the tables.
Initial Problem
Your Sector/Subsector designation breaks 1NF:
Each domain [column] must be Atomic wrt to the [datatypes available in the] platform.
That is a gross Normalisation error, which will have horrendous consequences downstream. The correction is:
Sector is one datum, one column, eg. Sector 1
SubSector is a separate datum, a separate column, eg. a, b, c
The Data • What is it ?
I need a guidance regarding structuring of the tables.
Ok. But what the data actually means, is not at all clear.
From the little info you have given, the following Predicates can be derived:
Each Operator is assigned to 0-or-1 Area
Assumption: an Operator cannot be in more than one place at a time
Assumption: an Operator may not be assigned
Each [assigned] Area is one of { Sector | SubSector | Unassigned }
AreaType is the Discriminator
Each Sector comprises 0-to-n SubSectors
Each Sector is occupied by 0-to-n Operators
Each SubSector is occupied by 0-to-n Operators
Please check and ensure that each is true (otherwise the data model is garbage).
Relational Data Model
Assuming those Predicates are correct, the Normalised Relational data model is:
Subtype • Exclusive
operators are working in either full sector or any of the sub sectors
What you are seeking in Logic terms is an OR Gate, in Relational terms, it is an Exclusive Subtype
Refer to Subtype for full details on Subtype implementation.
Note • Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners or novices, wrt a Relational database.
The Query • What is it ?
Now I have used this query which gives me no result
We do not know what result set you are attempting to obtain.
At this point, it does not appear to be related to the linked Question & Answer.
Please explain what result set you would like to obtain, in English. Hopefully observing the given data model.
Supplying the required SQL would then a simple matter.
Enjoy. Please feel free to ask specific questions.

Oracle11g, Multiple Pivots

So I have a working query that pivots some data for me.
SELECT * FROM (
select requisitions.ACC_ID AS "Accession #"
,tests.TEST_ID
,results.RESULT_NUMERIC
FROM requisitions
inner join req_panels ON requisitions.acc_id = req_panels.acc_id
inner join results ON req_panels.rp_id = results.rp_id
inner join tests ON results.test_id = tests.test_id
WHERE results.TEST_ID IN (1,2,3,4)
AND requisitions.RECEIVED_DATE > TO_DATE('9/1/2013', 'MM/DD/YYYY')
ORDER BY requisitions.ACC_ID
)
pivot(
MAX(RESULT_NUMERIC)
for TEST_ID IN ('1' AS Test1,'2' AS Test2,'3' AS Test3,'4' AS Test4)
)
Now, I have to include a different type of result (RESULTS_ALPHA in results table) as a column for each ACC_ID. RESULT_ALPHA is a clob. For the test_id's already included in the code above RESULTS_ALPHA is empty. But it holds a value for another test, we'll call it "TestAlpha".
So what I have currently output from the code above is;
Acc_ID | Test 1 | Test 2 | Test 3 | Test 4
-------------------------------------------
000001 | 24 | 1.5 | 0.5 | 2.1
000002 | 15 | 2.1 | 0.3 | 1.3
And I need to get
Acc_ID | Test 1 | Test 2 | Test 3 | Test 4 | TestAlpha
--------------------------------------------------------
000001 | 24 | 1.5 | 0.5 | 2.1 | abcd
000002 | 15 | 2.1 | 0.3 | 1.3 | efgh
How can I accomplish this? Another pivot?
Thanks.
If you can use just the first 4000 characters of a CLOB field then you can just substring it:
SELECT * FROM (
select requisitions.ACC_ID AS "Accession #"
,tests.TEST_ID
,results.RESULT_NUMERIC
,dbms_lob.substr(results.RESULT_ALPHA, 4000, 1) as result_alpha
FROM requisitions
...
Of course that gives you a 4000-character-wide column in your output, but not much you can do about that really, unless you can set a lower length based on knowledge of what's in the column. (Though if it's less than 4K, it doesn't really make sense to store it as a CLOB; sounds like you have a mix of data in there though).
Even if the value is more than 4000 characters this will show the start of it. Whether that's acceptable, or useful, depends what you're doing with the result of the pivot.
What you're doing does seem to assume that the RESULTS_ALPHA is the same for all results records for each TEST_ID; or even for each ACC_ID. Which seems a bit wasteful if it's true.
I'm not sure if there a non-programmatic solution to get the full CLOB back.

Find out irregular entries with SQL

I am having some human error entries in my table. Some missing a zero, some has more material than it should be, and so on. So I am trying to scan throughout a table to find some error in an entry groups.
Table goes like this:
| Work Order | Product | Material Qty
---------------------------------
| 1 | Item A | 10
| 2 | Item A | 25
| 3 | Item A | 12
| 4 | Item A | 9
| 5 | Item X | 52
| 6 | Item X | 20
| 7 | Item X | 23
| 8 | Item X | 24
| 9 | Item X | 2
| 10 | Item Z | 20
| 11 | Item Z | 5
---------------------------------
Now, the WO and WO item are not that sequential, I write it as sequential here only for examples.
As you can see, those item A should have number around 10, give or take some. Item X should be around 22, give or take some, meanwhile the query should tag Item Z as all suspicious since there are not enough data to correlate. So I need to isolate WO number 2, 5 and 9, 10 and 11 for people to audit. Any idea how?
I have been trying to create an average of them, and using a percentage to eliminate them. But sometimes, percentage number are too varies. And in case of item Z, there are not enough data to choose which number are normal number, and which number are irregular numbers, and I need to tag both of them for verification, in which case, reducing down to percentage won't help.
Also, if I reduce them to variant percentage against average, its spread are still too wide to tag one of them.
Any ideas? Because I am really stuck this time.
From a statistical basis, you probably want to start with the STDEV standard deviation function.
select *
from
(
select *,
AVG(qty) OVER( Partition by product) av,
STDEV(qty) OVER( Partition by product) sd,
COUNT(*) over (Partition by product) c
from yourtable
) v
where ABS(qty-av)>sd or c<3