I'm relatively inexperienced in SQL and could use some help beyond the usual SELECT and JOIN.
The Problem
Suppose you have 2 tables you wish to join in Microsoft SQL, however they are missing a unique identifier so duplicates entries are incorrectly generated. I've created an example SQLfiddle to try and demonstrate using a small subset of the full database schema http://sqlfiddle.com/#!18/df3fc.
One table has a list of measurement steps taken for 2 systems, identified by their serial. These measurement steps can contain multiple pieces of data, which are contained in the second table. This would not normally be an issue but, as in the sqlfiddle example for serial=1004, sometimes the same data may be retaken as part of a rework. When I then query, each piece of rework data gets joined to each step, duplicating data. The select query:
SELECT my_measurement_steps.id AS steps_id, my_measurement_steps.serial, my_measurement_data.id AS data_id, my_measurement_data.my_data, my_measurement_data.measurementid, my_measurement_steps.date
FROM my_measurement_steps INNER JOIN
my_measurement_data ON my_measurement_steps.serial = my_measurement_data.serial AND
my_measurement_steps.measurementid = my_measurement_data.measurementid
Desired Output
steps_id
serial
data_id
my_data
measurementid
date
15
1004
36
0.9496555
33
2021-10-12 07:55:58.100
14
1004
35
-0.03252285
11
2021-10-07 07:56:31.530
14
1004
34
-0.0003081787
11
2021-10-07 07:56:31.530
13
1004
33
-0.01728721
10
2021-10-07 07:56:31.530
13
1004
32
-0.1996608
10
2021-10-07 07:56:31.530
12
1004
31
0.003044653
9
2021-10-07 07:24:49.500
12
1004
30
0.002392432
9
2021-10-07 07:24:49.500
11
1004
29
1.012242
8
2021-10-07 07:24:30.720
11
1004
28
1.003897
8
2021-10-07 07:24:30.720
11
1004
27
0.9917302
8
2021-10-07 07:24:30.720
11
1004
26
-0.002975781
8
2021-10-07 07:24:30.720
11
1004
25
-0.002746948
8
2021-10-07 07:24:30.720
10
1004
24
0.9695401
33
2021-10-05 11:37:51.430
9
1005
23
0.9731983
33
2021-10-05 08:00:10.490
8
1005
22
0.01013499
11
2021-10-01 07:12:07.470
8
1005
21
-0.007311231
11
2021-10-01 07:12:07.470
7
1005
20
-0.0003634033
10
2021-10-01 07:12:07.470
7
1005
19
-0.2021408
10
2021-10-01 07:12:07.470
6
1005
18
-0.002507007
9
2021-09-30 13:00:57.260
6
1005
17
0.001181299
9
2021-09-30 13:00:57.260
5
1005
16
1.007857
8
2021-09-30 12:39:50.280
5
1005
15
1.000333
8
2021-09-30 12:39:50.280
5
1005
14
0.9913442
8
2021-09-30 12:39:50.280
5
1005
13
0.002449243
8
2021-09-30 12:39:50.280
5
1005
12
-0.002550488
8
2021-09-30 12:39:50.280
4
1004
11
-0.02970417
11
2021-09-30 06:57:33.160
4
1004
10
-0.0007542603
11
2021-09-30 06:57:33.160
3
1004
9
-0.005267761
10
2021-09-30 06:57:33.160
3
1004
8
-0.2038888
10
2021-09-30 06:57:33.160
2
1004
7
-0.007525305
9
2021-09-30 06:56:59.060
2
1004
6
-0.004998779
9
2021-09-30 06:56:59.060
1
1004
5
0.9935537
8
2021-09-29 12:34:08.090
1
1004
4
0.9952038
8
2021-09-29 12:34:08.090
1
1004
3
0.9978707
8
2021-09-29 12:34:08.090
1
1004
2
-0.0006630127
8
2021-09-29 12:34:08.090
1
1004
1
0.0002386719
8
2021-09-29 12:34:08.090
I'm unsure how to achieve the desired output given the repeating data. Also for some serials there can be more than 1 repeat as shown in the example.
Happy to provide any extra information required.
Many Thanks.
Code to Generate Tables
create table my_measurement_steps(id int, serial int, measurementid int, date datetime);
create table my_measurement_data(id int, serial int, my_data float(7), measurementid int);
insert into my_measurement_steps values
(1,1004,8,'2021-09-29 12:34:08.090'),
(2,1004,9,'2021-09-30 06:56:59.060'),
(3,1004,10,'2021-09-30 06:57:33.160'),
(4,1004,11,'2021-09-30 06:57:33.160'),
(5,1005,8,'2021-09-30 12:39:50.280'),
(6,1005,9,'2021-09-30 13:00:57.260'),
(7,1005,10,'2021-10-01 07:12:07.470'),
(8,1005,11,'2021-10-01 07:12:07.470'),
(9,1004,33,'2021-10-05 08:00:10.490'),
(10,1005,33,'2021-10-05 11:37:51.430'),
(11,1004,8,'2021-10-07 07:24:30.720'),
(12,1004,9,'2021-10-07 07:24:49.500'),
(13,1004,10,'2021-10-07 07:56:31.530'),
(14,1004,11,'2021-10-07 07:56:31.530'),
(15,1004,33,'2021-10-12 07:55:58.100');
insert into my_measurement_data values
(1,1004,0.0002386719,8),
(2,1004,-0.0006630127,8),
(3,1004,0.9978707,8),
(4,1004,0.9952038,8),
(5,1004,0.9935537,8),
(6,1004,-0.004998779,9),
(7,1004,-0.007525305,9),
(8,1004,-0.2038888,10),
(9,1004,-0.005267761,10),
(10,1004,-0.0007542603,11),
(11,1004,-0.02970417,11),
(12,1005,-0.002550488,8),
(13,1005,0.002449243,8),
(14,1005,0.9913442,8),
(15,1005,1.000333,8),
(16,1005,1.007857,8),
(17,1005,0.001181299,9),
(18,1005,-0.002507007,9),
(19,1005,-0.2021408,10),
(20,1005,-0.0003634033,10),
(21,1005,-0.007311231,11),
(22,1005,0.01013499,11),
(23,1004,0.9695401,33),
(24,1005,0.9731983,33),
(25,1004,-0.002746948,8),
(26,1004,-0.002975781,8),
(27,1004,0.9917302,8),
(28,1004,1.003897,8),
(29,1004,1.012242,8),
(30,1004,0.002392432,9),
(31,1004,0.003044653,9),
(32,1004,-0.1996608,10),
(33,1004,-0.01728721,10),
(34,1004,-0.0003081787,11),
(35,1004,-0.03252285,11),
(36,1004,0.9496555,33);
Edits
Added datestamp to measurement step table - sqlfiddle not working so can't update.
All tables now updated and sqlfiddle
Removed section and added desired output
You want to detect blocks of rows belonging together.
When sorting my_measurement_steps we see that serial/measurementid 1004/8 occurs twice for instance, once in row #1 and then again in row #11.
When sorting my_measurement_data we see about the same thing. The serial/measurementid 1004/8 occurs in two blocks, once in rows #1-5 and then again in rows #25-29.
You want to join the serial/measurementid's nth occurence in my_measurement_steps with its nth occurrence in my_measurement_data.
The detection of such blocks is called a gaps and islands problem. This can be done with two concurrent row counts.
with data_groups_found as
(
select
my_measurement_data.*,
row_number() over (order by id) -
row_number() over (partition by serial, measurementid order by id) as grp
from my_measurement_data
)
, data_groups_numbered as
(
select
data_groups_found.*,
dense_rank() over (partition by serial, measurementid order by grp) as grp_id
from data_groups_found
)
, steps_numbered as
(
select
my_measurement_steps.*,
row_number() over (partition by serial, measurementid order by id) as grp_id
from my_measurement_steps
)
select *
from steps_numbered s
left join data_groups_numbered d
on d.serial = s.serial
and d.measurementid = s.measurementid
and d.grp_id = s.grp_id
order by s.id, d.id;
Demo: http://sqlfiddle.com/#!18/df3fc/6
This question already has an answer here:
How can I perform this aggregate?
(1 answer)
Closed 9 years ago.
I have crated two table one is cutomer and other one is ord
select * from customers;
Customer table
1 101 jun 23 yyyy 15000
2 102 jas 24 zzzz 10000
3 103 fat 20 kkkk 20000
4 104 jini 40 llll 30000
5 105 michael 30 dddd 25000
6 106 das 25 hhhh 10000
7 107 vijay 26 mmmm 12000
8 108 thanku 31 jjjj 26000
9 109 vishnu 34 gggg 24000
10 110 vas 28 ffff 18000
select * from ord;
This is order table
1 12/11/2013 1:00:00 AM 102 2500
2 202 12/11/2013 4:14:17 AM 102 3000
3 203 12/9/2013 9:18:16 PM 103 2000
4 204 12/8/2013 12:00:00 PM 102 1000
5 205 12/24/2013 107 2000
This is tha union command that I have used
select c.name,c.salary,o.amount
from CUSTOMERS c
inner join ord o
on c.id=o.customer_id;
then the resulting table is
1 jas 10000 1000
2 jas 10000 3000
3 jas 10000 2500
4 fat 20000 2000
5 vijay 12000 2000
I want resulting table like this
1 jas 10000 6500
2 fat 20000 2000
3 vijay 12000 2000
plz help me for solving this.
group by c.name, c.salary with sum(salary) is what you want:
select c.name, c.salary, sum(o.amount )
from CUSTOMERS c
inner join ord o on c.id=o.customer_id
group by c.name, c.salary;
try this if it will work.
select c.name,c.salary,sum(o.amount)
from CUSTOMERS c
inner join ord o
on c.id=o.customer_id
group by 1,2;
Thanks.
select c.name,c.salary,SUM(o.amount )
from CUSTOMERS c
inner join ord o
on c.id=o.customer_id
GROUP BY c.name,c.salary
I think this will work
Use Left Join or RIGHT JOIN
select c.name,c.salary,o.amount
from CUSTOMERS c
left join ord o
on c.id=o.customer_id;
I have 2 tables. One of the tables have 7 values and the other table has 5 values. These tables have their primary keys in common. I want to join both tables this way:
If I have a Table
English French
-------------------- --------------------
one Un
two Deux
three Trois
four Quatre
four Quattro
five Cinq
five Cinco
And another one:
English French
-------------------- --------------------
one aaaaa
two bbbbb
three ccccc
four
five
I want to have a table like this:
English French
-------------------- --------------------
one Un
one aaaaa
two Deux
two bbbb
three Trois
three ccccc
four Quatre
four Quattro
four --------
five Cinq
five Cinco
five ----------
I tried using join but it does a linear combination of the values four and five. How can I go about doing this? Thanks.
Edit: SQL query:
SELECT l.date_location, l.duree, r.km_restitution, r.km_parcouru
FROM locations l, restitutions r
UNION
SELECT l.num_client, l.date_location, l.duree, r.km_restitution, r.km_parcouru
FROM locations l, restitutions r
id_agence num_immatriculation num_client km_restitution km_parcouru state date_restitution
1 406BON69 1002 30000 1000 BON 29-MAY-10
3 785CIM13 1001 56580 80 BON 09-AUG-08
5 800BBB75 1000 2020 20 BON 24-APR-11
4 307VXN78 1000 20040 40 BON 28-JAN-11
2 290UTT92 1004 30030 30 BON 01-AUG-10
5 777SET13 1005 4030 30 BON 26-APR-11
2 179CLV92 1004 15015 15 BON 03-FEB-11
5 400AAA75 1003 1020 20 BON 18-SEP-11
5 666NEF69 1004 3040 40 BON 15-APR-11
2 111AAA75 1001 20020 20 BON 21-DEC-09
1 333CCC78 1001 43250 40 BON 27-DEC-09
2 260CDE95 1003 79000 430 BON 10-SEP-09
4 307VXN78 1003 20090 90 BON 11-FEB-11
1 123ABC78 1003 10010 10 BON 04-OCT-10
1 222BBB77 1001 9050 50 BON 23-DEC-09
Locations
id_agence num_immatricul num_client duree date_location
2 406BON69 1002 20 10-MAY-10
3 785CIM13 1001 3 07-AUG-08
5 800BBB75 1000 7 18-APR-11
4 307VXN78 1000 5 24-JAN-11
1 290UTT92 1004 1 31-JUL-10
5 777SET13 1005 4 23-APR-11
1 179CLV92 1004 5 30-JAN-11
5 400AAA75 1003 2 17-SEP-11
2 123ABC78 1003 4 01-OCT-10
5 666NEF69 1004 5 11-APR-11
1 111AAA75 1001 2 20-DEC-09
1 222BBB77 1001 2 22-DEC-09
1 333CCC78 1001 3 25-DEC-09
1 260CDE95 1003 10 01-SEP-09
4 307VXN78 1003 13 30-JAN-11
2 123ABC78 1003 8 20-NOV-11
2 406BON69 1002 10 20-NOV-11
Desired Result
id_agence num_immatricul num_client duree date_location date_restitution
2 406BON69 1002 20 10-MAY-10 date_restitution
3 785CIM13 1001 3 07-AUG-08 date_restitution
5 800BBB75 1000 7 18-APR-11 date_restitution
4 307VXN78 1000 5 24-JAN-11 date_restitution
1 290UTT92 1004 1 31-JUL-10 date_restitution
5 777SET13 1005 4 23-APR-11 date_restitution
1 179CLV92 1004 5 30-JAN-11 date_restitution
5 400AAA75 1003 2 17-SEP-11 date_restitution
2 123ABC78 1003 4 01-OCT-10 date_restitution
5 666NEF69 1004 5 11-APR-11 date_restitution
1 111AAA75 1001 2 20-DEC-09 date_restitution
1 222BBB77 1001 2 22-DEC-09 date_restitution
1 333CCC78 1001 3 25-DEC-09 date_restitution
1 260CDE95 1003 10 01-SEP-09 date_restitution
4 307VXN78 1003 13 30-JAN-11 date_restitution
2 123ABC78 1003 8 20-NOV-11 ----------------
2 406BON69 1002 10 20-NOV-11 ---------------
Apart from the column name, where i put date_restitution contains real dates.
You could use a UNION:
select English, French from Table1
UNION ALL
select English, French from Table2
or a full outer join
select distinct coalesce(T1.English, T2.English), coalesce(T1.French, T2.French)
from Table1 T1
full outer join Table2 T2 on T1.English = T2.English
EDIT:
Assuming you want restitutions.date_restitution to appear in place of date_location for restitution records -
SELECT l.num_client, l.date_location, l.duree, to_number(null) km_restitution, to_number(null) km_parcouru
FROM locations l
UNION ALL
SELECT r.num_client, r.date_restitution date_location, 0 duree, r.km_restitution, r.km_parcouru
FROM restitutions r
FURTHER EDIT (based on supplied results):
select l.id_agence,
l.num_immatricul,
l.num_client,
l.duree,
l.date_location,
decode(r.date_restitution, NULL,'----------------', 'date_restitution')
as date_restitution -- or just r.date_restitution
from location l
left outer join restitution r
on l.id_agence = r.id_agence and
l.num_immatricul = r.num_immatricul and
l.num_client = r.num_client and
l.date_location <= r.date_restitution
You actually need a union:
SELECT English, French FROM T1
UNION
SELECT English, French FROM T2
If you don't care about duplicates, you can use UNION ALL
Edit after OP's comment:
SELECT l.num_client, l.id_agence, l.num_immatricul
FROM locations l
UNION
SELECT r.num_client, r.id_agence, r.num_immatriculation
FROM restitutions r
The following should do it.
SELECT tab1.English, tab1.French
UNION
SELECT tab2.English, tab2.French
For other readers who might have the same problem. From the experience I had with this problem, it would be a good idea to join tables locations and restitutions since both of them have almost the same attributes and data. I finally decided in changing my database and creating a new table that contains both the attributes of location and restitution and setting some not availabe values to NULL. This would reduce a lot of joins between tables and queries would be easier to handle.