Defining a new variable in SELECT clause in SQL developer - sql

I am new to SQL and the extraction of data from databases, so please bear with me. I only have experience with coding in statistical programs, including Stata, SAS, and R.
I currently have a SELECT clause that extracts a table from an Oracle database.
To simplify the question, I make use of a illustrative example:
I am interested in CREATING a new variable, which is not included in the database and must be defined based on the other variables, that contains the weight of their mother. Since I am new to SQL, I do not know if this is possible to do in the SELECT clause or if there exist more efficient options
Note that,
Mother and Mother_number are referring to the "same numbers", meaning that mothers and daughters are represented in the model.
AA (number 1) and CC (number 3) have the same mother (BB) (number 2)
I need to do some conversion of the date, e.g. to_char(a.from_date, 'dd-mm-yyyy') as fromdat since SQL confuses the year with the day-of-the month
The SQL code:
select to_char(a.from_date, 'dd-mm-yyyy') as fromdate, a.Name, a.Weight, a.Number, a.Mother_number
from table1 a, table2 b
where 1=1
and a.family_ref=b.family_ref
and .. (other conditions)
What I currently obtain:
| fromdate | Name | Weight | Number | Mother_number |
|------------|------|--------|--------|---------------|
| 06-07-2021 | AA | 100 | 1 | 2 |
| 06-07-2021 | BB | 200 | 2 | 3 |
| 06-07-2021 | CC | 300 | 3 | 2 |
| 06-07-2021 | DD | 400 | 4 | 5 |
| 06-07-2021 | EE | 500 | 5 | 6 |
| ... | ... | ... | ... | ... |
What I wish to obtain:
| fromdate | Name | Weight | Number | Mother_number | Mother_weight |
|------------|------|--------|--------|---------------|---------------|
| 06-07-2021 | AA | 100 | 1 | 2 | 200 |
| 06-07-2021 | BB | 200 | 2 | 3 | 300 |
| 06-07-2021 | CC | 300 | 3 | 2 | 200 |
| 06-07-2021 | DD | 400 | 4 | 5 | 500 |
| 06-07-2021 | EE | 500 | 5 | 6 | … |
| | … | … | … | … | …

Assuming the MOTHER_NUMBER value is referencing the same value as the NUMBER variable just join the table with itself.
select a.fromdate
, a.name
, a.weight
, a.number
, a.mother_number
, b.weight as mother_weight
from HAVE a
left join HAVE b
on a.mother_number = b.number

Although I'm not sure I'm following the "mother" logic, the way you need to implement the last column in your SELECT statement is to add b.weight as Mother_Weight in the end of the first line, before the for keyword.
Since the b table references "Mothers", you can add the column just by taking the weight of the person in table b.
If instead you wish to add the data of a person's mother's weight, you can do that by adding a column to the relevant table and then updating each row in your table by executing the statements below:
ALTER TABLE table1 ADD Mother_weight FLOAT;
UPDATE table1 SET Mother_weight=(SELECT (Weight) FROM table2 WHERE table1.family_ref=table2.familyref);
Then you add the a.Mother_weight clause in your SELECT statement.

Use a hierarchical query:
SELECT to_char(a.fromdate, 'dd-mm-yyyy') as fromdate,
a.Name,
a.Weight,
a."NUMBER",
a.Mother_number,
PRIOR weight AS mother_weight
FROM table1 a
INNER JOIN table2 b
ON (a.family_ref=b.family_ref)
WHERE LEVEL = 2
OR ( LEVEL = 1
AND NOT EXISTS(
SELECT 1
FROM table1 x
WHERE a.mother_number = x."NUMBER"
)
)
CONNECT BY NOCYCLE
PRIOR "NUMBER" = mother_number
AND PRIOR a.family_ref = a.family_ref
ORDER BY a."NUMBER"
Or, a sub-query factoring clause and a self-join:
WITH data (fromdate, name, weight, "NUMBER", mother_number) AS (
SELECT to_char(a.fromdate, 'dd-mm-yyyy'),
a.Name,
a.Weight,
a."NUMBER",
a.Mother_number
FROM table1 a
INNER JOIN table2 b
ON (a.family_ref=b.family_ref)
)
SELECT d.*,
m.weight AS mother_weight
FROM data d
LEFT OUTER JOIN data m
ON (d.mother_number = m."NUMBER")
ORDER BY d."NUMBER"
Which, for the sample data:
CREATE TABLE table1 (family_ref, fromdate, Name, Weight, "NUMBER", Mother_number) AS
SELECT 1, DATE '2021-07-06', 'AA', 100, 1, 2 FROM DUAL UNION ALL
SELECT 1, DATE '2021-07-06', 'BB', 200, 2, 3 FROM DUAL UNION ALL
SELECT 1, DATE '2021-07-06', 'CC', 300, 3, 2 FROM DUAL UNION ALL
SELECT 1, DATE '2021-07-06', 'DD', 400, 4, 5 FROM DUAL UNION ALL
SELECT 1, DATE '2021-07-06', 'EE', 500, 5, 6 FROM DUAL;
CREATE TABLE table2 (family_ref) AS
SELECT 1 FROM DUAL;
Both output:
FROMDATE
NAME
WEIGHT
NUMBER
MOTHER_NUMBER
MOTHER_WEIGHT
06-07-2021
AA
100
1
2
200
06-07-2021
BB
200
2
3
300
06-07-2021
CC
300
3
2
200
06-07-2021
DD
400
4
5
500
06-07-2021
EE
500
5
6
db<>fiddle here

Related

Bigquery: Joining 2 tables one having repeated records and one with count ()

I want to join tables after unnest arrays in Table:1 but the records duplicated after the join because of the unnest.
Table:1
| a | d.b | d.c |
-----------------
| 1 | 5 | 2 |
- -------------
| | 3 | 1 |
-----------------
| 2 | 2 | 1 |
Table:2
| a | c | f |
-----------------
| 1 | 12 | 13 |
-----------------
| 2 | 14 | 15 |
I want to join table 1 and 2 on a but I need also to have the output of:
| a | d.b | d.c | f | h | Sum(count(a))
---------------------------------------------
| 1 | 5 | 2 | 13 | 12 |
- ------------- - - 1
| | 3 | 1 | | |
---------------------------------------------
| 2 | 2 | 1 | 15 | 14 | 1
a can be repeated in table 2 for that I need to count(a) then select the sum after join.
My problem is when I'm joining I need the nested and repeated record to be the same as in the first table but when use aggregation to get the sum I can't group by struct or arrays so I UNNEST the records first then use ARRAY_AGG function but also there was an issue in the sum.
SELECT
t1.a,
t2.f,
t2.h,
ARRAY_AGG(DISTINCT(t1.db)) as db,
ARRAY_AGG(DISTINCT(t1.dc)) as dc,
SUM(t2.total) AS total
FROM (
SELECT
a,
d.b as db,
d.c as dc
FROM
`table1`,
UNNEST(d) AS d,
) AS t1
LEFT JOIN (
SELECT
a,
f,
h,
COUNT(*) AS total,
FROM
`table2`
GROUP BY
a,f,h) AS t2
ON
t1.a = t2.a
GROUP BY
1,
2,
3
Note: the error is in the total number after the sum it is much higher than expected all other data are correct.
I guess your table 2 contains is not unique for column a.
Lets assume that the table 2 looks like this:
a
c
f
1
12
13
2
14
15
1
100
101
There are two rows where a is 1. Since b and f are different, the grouping does not solve this ( GROUP BY a,f,h) AS t2) and counts(*) as total is one for each row.
a
c
f
total
1
12
13
1
2
14
15
1
1
100
101
1
In the next step you join this table to your table 1. The rows of table1 with value 1 in column a are duplicated, because table2 has two entries. This lead to the fact that the sum is too high.
Instead of unnesting the tables, I recommend following approach:
-- Creating of sample data as given:
with tbl_A as (select 1 a, [struct(5 as b,2 as c),struct(3,1)] d union all select 2,[struct(2,1)] union all select null,[struct(50,51)]),
tbl_B as (select 1 as a,12 b, 13 f union all select 2,14,15 union all select 1,100,101 union all select null,500,501)
-- Query:
select *
from tbl_A A
left join
(Select a,array_agg(struct(b,f)) as B, count(1) as counts from tbl_B group by 1) B
on ifnull(A.a,-9)=ifnull(B.a,-9)

tSQL aggregate functions and group bys

So, I can't seem to find a way to get this to work. But, what I need is as follow.
I have a table that has lets say types.
Type_ID, Type_Description
Then I have a table of Items. [Item Type is fk to type table]
Item_ID, Item_Type
Then I have a Results Table. [Item_ID is fk to Item table]
Result_ID, Item_ID, Cost
So what i am needing for output is Grouped by the Type_ID
- The Count of Items(can be 0),
- The Count of Results(Can be 0),
- and the sum of Cost(can be 0)
I dont have direct access to these tables. I am having to build the sql and send it to an api so I dont get to know the error simply the results if successful and error 500 if not.
Seems to be older tSQL. As List and STRING_AGG dont seem to be available.
EDIT: As requested - Sample Data
+---------+------------------+
| Type_ID | Type_Description |
+---------+------------------+
| 1 | Example 1 |
+---------+------------------+
| 2 | Example 2 |
+---------+------------------+
+---------+---------------+
| ITEM_ID | ITEM_TYPE |
+---------+---------------+
| 1 | 1 |
+---------+---------------+
| 2 | 1 |
+---------+---------------+
| 3 | 1 |
+---------+---------------+
| 4 | 2 |
+---------+---------------+
| 5 | 2 |
+---------+---------------+
+-----------+---------+------+
| Result_ID | Item_ID | Cost |
+-----------+---------+------+
| 1 | 1 | 10 |
+-----------+---------+------+
| 2 | 1 | 20 |
+-----------+---------+------+
| 3 | 2 | 5 |
+-----------+---------+------+
| 4 | 5 | 100 |
+-----------+---------+------+
Desired Output
+---------+------------+--------------+------+
| Type_ID | Item_Count | Result_Count | Cost |
+---------+------------+--------------+------+
| 1 | 3 | 3 | 35 |
+---------+------------+--------------+------+
| 2 | 2 | 1 | 100 |
+---------+------------+--------------+------+
I think GMB's answer was quite good, but in case there is a type with no items (something in your requirements), it will not be displayed.
So first of all let's create the input data:
select 1 as Type_ID, 'Example 1' as Type_Description into #type
union all
select 2, 'Example 2'
union all
select 3, 'Example 3'
select 1 Result_ID, 1 Item_ID, 10 Cost into #item
union all
select 2, 1, 20 Cost
union all
select 3, 2, 5 Cost
union all
select 4, 5, 100 Cost
select 1 Item_ID, 1 Item_Type INTO #item_type
union all
select 2, 1
union all
select 3, 1
union all
select 4, 2
union all
select 5, 2
Note I added also a type 3 with no items to test the no item case.
And then the query you need:
SELECT
t.Type_ID,
COUNT(DISTINCT it.Item_ID) Item_Count,
COUNT(DISTINCT i.Result_ID) Result_Count,
SUM(ISNULL(Cost, 0)) Cost
FROM #type t
LEFT JOIN #item_type it on it.Item_Type = t.Type_ID
LEFT JOIN #item i on i.Item_ID = it.Item_ID
GROUP BY t.Type_ID
I think it is pretty straightforward and doesn't need much explanation, but feel free to ask in the comments if necessary.
The results are just like you requested, with also a line for type 3:
+---------+------------+--------------+------+
| Type_ID | Item_Count | Result_Count | Cost |
+---------+------------+--------------+------+
| 1 | 3 | 3 | 35 |
+---------+------------+--------------+------+
| 2 | 2 | 1 | 100 |
+---------+------------+--------------+------+
| 3 | 0 | 0 | 0 |
+---------+------------+--------------+------+
You mentioned that the count of items could be 0 and also the count of results. But aren't both values always either 0, or both > 0? Only in case your type-item many-to-many table doesn't have a FK, you could have that scenario. For example, if I add:
insert into #item_type
select 6, 3
Then the last row is:
+---------+------------+--------------+------+
| 3 | 1 | 0 | 0 |
+---------+------------+--------------+------+
I am not sure if that makes sense in your scenario, but as your post implies that items and results can be 0 independently, that confused me a bit.
You can join and aggregate. Assuming that your tables are called types, types_costs and costs, that would be:
select
t.type_id,
count(distinct tc.item_id) item_count,
count(distinct c.result_id) result_count,
sum(c.cost) cost
from types t
inner join types_costs tc on tc.item_type = t.type_id
left costs c on c.item_id = tc.item_id
group by t.type_id
An important thing is to use a left join to bring the costs table so item_ids that do not exist in costs are not eliminated before you get a change to count them. Depending on your actual use case, you might also want a left join on table types_costs.

How to combine rows and columns?

I have two tables. I need to combine two table.
This First table.
----------------------------
| row_no | Part No |Qty_A |
----------------------------
| 1 | A | 100 |
| 2 | A | 300 |
----------------------------
Second table.
----------------------------
| row_no | Part No |Qty_B |
----------------------------
| 1 | A | 400 |
| 2 | B | 200 |
----------------------------
This is my result:
--------------------------------------
| row_no | Part No | Qty_A | Qty_B |
--------------------------------------
| 1 | A | 100 | 400 |
| 2 | A | 300 | - |
| 2 | B | - | 200 |
--------------------------------------
Two tables was joined by "row_no" and "Part_no" column.
I try to use "LEFT OUTER JOIN" but results not as expected.
SELECT t1.row_no ,t1.part_no ,t1.Qty_A ,t2.Qty_B
FROM
(SELECT 1 as row_no,'A' as part_no,100 as Qty_A) as t1
LEFT OUTER JOIN
(SELECT 1 as row_no, 'B' as part_no,200 as Qty_B) as t2
ON t1.row_no = t2.row_no and t1.part_no = t2.part_no
Sorry for my unclear example.
Update
This is example from a large transaction.
And I need to group it by Part_no and re-arrange by row number like these.
Try below query with union all:
select row_no ,part_no ,Qty_A , '-' as Qty_B from tableA
union all
select row_no ,part_no ,'-' as Qty_A , Qty_B from tableb
or you can try with full outer join:
SELECT t1.row_no ,t1.part_no ,t1.Qty_A ,t2.Qty_B
FROM
(SELECT 1 as row_no,'A' as part_no,100 as Qty_A) as t1
full OUTER JOIN
(SELECT 1 as row_no, 'B' as part_no,200 as Qty_B) as t2
ON t1.row_no = t2.row_no and t1.part_no = t2.part_no
The UNION operator is used to combine the result-set of two or more SELECT statements.
- Each SELECT statement within UNION must have the same number of
columns
- The columns must also have similar data types
- The columns in each SELECT statement must also be in the same order
The first query in the union statement defines the column names.
So in your case you could
select row_no ,part_no ,Qty_A , null as Qty_B from table1
union all
select row_no ,part_no , null, Qty_B from table2

BigQuery - Find the closest region

I have two tables, and for each region in A, I want to find the closest regions in B.
A:
------------------------
ID | Start | End | Color
------------------------
1 | 400 | 500 | White
------------------------
1 | 10 | 20 | Red
------------------------
2 | 2 | 10 | Blue
------------------------
4 | 88 | 90 | Color
------------------------
B:
------------------------
ID | Start | End | Name
------------------------
1 | 1 | 2 | XYZ1
------------------------
1 | 50 | 60 | XYZ4
------------------------
2 | 150 | 160 | ABC1
------------------------
2 | 50 | 60 | ABC2
------------------------
4 | 100 | 120 | EFG
------------------------
RS:
---------------------------------------
ID | Start | End | Color | Closest Name
---------------------------------------
1 | 400 | 500 | White | XYZ4
---------------------------------------
1 | 10 | 20 | Red | XYZ1
---------------------------------------
2 | 2 | 10 | Blue | ABC2
---------------------------------------
4 | 88 | 90 | Color | EFG
---------------------------------------
Currently, I first find min distance by joining two tables:
MinDist Table:
SELECT A.ID, A.Start, A.End,
MIN(CASE
WHEN (ABS(A.End-B.Start)>=ABS(A.Start - B.End))
THEN ABS(A.Start-B.End)
ELSE ABS(A.End - B.Start)
END) AS distance
FROM ( Select A ... )
Join B On A.ID=B.ID)
Group By A.ID, A.Start, A.End
Then recompute distance for by joining table A and B again,
GlobDist Table (Note, the query retrieves B.Name in this case):
SELECT A.ID, A.Start, A.End,
CASE
WHEN (ABS(A.End-B.Start)>=ABS(A.Start - B.End))
THEN ABS(A.Start-B.End)
ELSE ABS(A.End - B.Start)
END AS distance,
B.Name
FROM ( Select A ... )
Join B On A.ID=B.ID)
Finally join these two tables MinDist and GlobDist Tables on
GlobDist.ID= MinDist.ID,
GlobDist.Start=MinDist.Start,
GlobDist.End= MinDist.End,
GlobDist.distance= MinDist.distance.
I tested ROW_NUMBER() and PARTITION BY over (ID, Start, End), but it took much longer. So, what's the fastest and most efficient way of solving this problem? How can I reduce duplicate computation?
Thanks!
Below solution is for BigQuery Standard SQL and as simple and short as below
#standardSQL
SELECT a_id, a_start, a_end, color,
ARRAY_AGG(name ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] name
FROM A JOIN B ON a_id = b_id
GROUP BY a_id, a_start, a_end, color
-- ORDER BY a_id
You can test / play with above using dummy data in your question
#standardSQL
WITH A AS (
SELECT 1 a_id, 400 a_start, 500 a_end, 'White' color UNION ALL
SELECT 1, 10, 20 , 'Red' UNION ALL
SELECT 2, 2, 10, 'Blue' UNION ALL
SELECT 4, 88, 90, 'Color'
), B AS (
SELECT 1 b_id, 1 b_start, 2 b_end, 'XYZ1' name UNION ALL
SELECT 1, 50, 60, 'XYZ4' UNION ALL
SELECT 2, 150, 160,'ABC1' UNION ALL
SELECT 2, 50, 60, 'ABC2' UNION ALL
SELECT 4, 100, 120,'EFG'
)
SELECT a_id, a_start, a_end, color,
ARRAY_AGG(name ORDER BY POW(ABS(a_start - b_start), 2) + POW(ABS(a_end - b_end), 2) LIMIT 1)[SAFE_OFFSET(0)] name
FROM A JOIN B ON a_id = b_id
GROUP BY a_id, a_start, a_end, color
ORDER BY a_id
with result as below
Row a_id a_start a_end color name
1 1 400 500 White XYZ4
2 1 10 20 Red XYZ1
3 2 2 10 Blue ABC2
4 4 88 90 Color EFG

How can I include records based on when they first appear, and exclude other records based on a certain date?

I need to select criteria based on inclusions and exclusions of the same attributes and on different dates and I am having a hard time wrapping my head around how to do this. Here is a list of my criteria.
The record was first added to the database with a transaction of 222 or 223 and an activity code of ‘A’.
The record does not have a status code of 7
Any records where the latest activity code (A, V, W, J) for the following transactions (109, 154, 982, 745) after 10/01/2009 should not be included in the results,
There are 2 tables involved with a join on the employee ID.
Table 1:
|id | statcode
| 1 | 1
| 2 | 3
| 3 | 7
| 4 | 2
Table 2:
|id | date | act_code | trans
| 1 | 1/1/17 | Z | 109
| 1 | 3/4/12 | A | 222
| 1 | 2/14/09 | A | 154
| 2 | 1/1/17 | A | 223
| 2 | 6/6/13 | V | 109
| 3 | 11/23/14 | A | 222
| 4 | 12/13/16 | X | 154
| 4 | 11/23/14 | W | 223
What I’d like to return is:
| id | statcode| date | act_code | trans
| 1 | 1 | 3/4/12 | A | 222
ID 2 would not be selected because the first trans is not one of the correct values. ID 3 would not be included due to a incorrect status code. ID 4 would not be selected because the latest act_code is not one of the correct values. Anyone have an idea as to how to go about this? Thanks in advance.
edit: Here is the query as attempted. It seems to return everything.
SELECT *
FROM firsttable a
join secondtable b on a.id=b.id
where exists (select id, min(date) from
secondtable
where c.TRANS in ('222','223') and (TRANS NOT IN ('109', '154', '982',
'745')
AND ACT_CODE NOT IN ('A', 'V', 'W', 'J') and date>= to_date('10/01/2009',
'MM/DD/YYYY'))
group by id)
and a.statcode <> '07'
;
Try This :
select Emp_id,statcode,date,act_code,trans
from table1 A join table2 b on (a.id=b.id)
WHERE b.trans in(222,223)
AND b.act_code='A'
AND a.stat_code<>7
AND (SELECT NVL(max(date),TO_DATE('01-JAN-1990','DD-MON-YYYY'))
FROM table2 c
WHERE c.id=b.id
AND c.act_code in ('A', 'V', 'W', 'J')
AND c.trans IN (109, 154, 982, 745)) < TO_DATE('01-OCT-1990','DD-MON-YYYY');
*Updated: now tested and working.
SELECT * FROM
tbl1 a INNER JOIN tbl2 b on a.id = b.id
where b.trans IN (222,223)
AND a.statcode <> 7
AND b.act_code = 'A'
AND NOT EXISTS(
SELECT 1 from tbl2 t2
WHERE t2.id = a.id
AND t2.act_code IN ('A','V','W','J')
AND t2.trans IN (109,154,982,745)
AND t2."date" > to_date('10/01/2009', 'MM/DD/YY'));
GO