SQL combine two records based on one value - sql

Update - work done in SQL-92
I work in SQL reporting tool and trying to combine two records into one. Let's say there as some duplicates were time got split into two values and hence the duplication. Basically any values that are not duplicated should be added
wo---text---time---value
1----test---5------1
1----test---2------a
3----aaaa---3------1
4----bbbb---4------2
Results
wo---text---time----value
1----test---7--------1a
3----aaaa---3--------1
4----bbbb---4--------2
I tried:
SELECT ....
FROM ....
GROUP BY wo SUM (time) but that did not even work.

Set-up:
create table so48345659a
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
create table so48345659b
(
wo integer,
text varchar(4),
time integer,
value varchar(2)
);
insert into so48345659a (wo, text, time, value) values (1, 'test', 5, '1');
insert into so48345659a (wo, text, time, value) values (1, 'test', 2, 'a');
insert into so48345659a (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659a (wo, text, time, value) values (4, 'bbbb', 4, '2');
insert into so48345659b (wo, text, time, value) values (1, 'test', 7, '1a');
insert into so48345659b (wo, text, time, value) values (3, 'aaaa', 3, '1');
insert into so48345659b (wo, text, time, value) values (4, 'bbbb', 4, '2');
Union, by default removes duplicates
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b;
Result:
wo | text | time | value
----+------+------+-------
1 | test | 7 | 1a
1 | test | 2 | a
3 | aaaa | 3 | 1
1 | test | 5 | 1
4 | bbbb | 4 | 2
(5 rows)
So now run sum on the union
select
wo,
sum(time) as total_time
from
(
select wo, text, time, value from so48345659a
union
select wo, text, time, value from so48345659b
) x
group by
wo;
Result:
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)
From your supplementary question (22-Jan-2017), I guess you mean that you have one table that contains duplicate rows. Is that right?
If so, it might look like this:
select * from so48345659c;
wo | text | time | value
----+------+------+-------
1 | test | 5 | 1
1 | test | 2 | a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
1 | test | 7 | 1a
3 | aaaa | 3 | 1
4 | bbbb | 4 | 2
(7 rows)
So then you get the sum of the times, ignoring duplicate rows, like this:
select
wo,
sum(time) as total_time
from
(
select distinct wo, text, time, value from so48345659c
) x
group by
wo;
wo | total_time
----+------------
3 | 3
1 | 14
4 | 4
(3 rows)

With just two values, you can do:
select wo, text, sum(time) as time, concat(min(value), max(value)) as value
from t
group by wo, text;
This uses the fact that the standard representation of 1 has a value less than a.
Most databases support string aggregation of some sort (group_concat(), listagg(), and string_agg() are typical functions). You can use one of these for a more general solution.

Related

Creating column for every group in group by

Suppose I have a table T which has entries as follows:
id | type | value |
-------------------------
1 | A | 7
1 | B | 8
2 | A | 9
2 | B | 10
3 | A | 11
3 | B | 12
1 | C | 13
2 | C | 14
For each type, I want a different column. Since the number of types is exhaustive, I would like all different types to be enumerated and a corresponding column for each. I wanted to make id a primary key for the table.
So, the desired output is something like:
id | A's value | B's value | C's value
------------------------------------------
1 | 7 | 8 | 13
2 | 9 | 10 | 14
3 | 11 | 12 | NULL
Please note that this is a simplified version. The actual table T is derived from a much bigger table using group by. And for each group, I would like a separate column. Is that even possible?
Use conditional aggregation:
select id,
max(case when type = 'A' then value end) as a_value,
max(case when type = 'B' then value end) as b_value,
max(case when type = 'C' then value end) as c_value
from t
group by id;
I'd recommend looking into the PIVOT function:
https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
The main blocker with this function though is the list of values for the pivot_column needs to be
pre-determined. To do this, I normally use the LISTAGG function:
https://docs.snowflake.com/en/sql-reference/functions/listagg.html
I've included a query below to show you how to build that string,
and doing this together in a script like
Python or even a Stored Procedure should be fairly straightforward (build the pivot_column, build the aggregate/pivot command, execute the aggregate/pivot command).
I hope this helps...Rich
CREATE OR REPLACE TABLE monthly_sales(
empid INT,
amount INT,
month TEXT)
AS SELECT * FROM VALUES
(1, 10000, 'JAN'),
(1, 400, 'JAN'),
(2, 4500, 'JAN'),
(2, 35000, 'JAN'),
(1, 5000, 'FEB'),
(1, 3000, 'FEB'),
(2, 200, 'FEB'),
(2, 90500, 'FEB'),
(1, 6000, 'MAR'),
(1, 5000, 'MAR'),
(2, 2500, 'MAR'),
(2, 9500, 'MAR'),
(1, 8000, 'APR'),
(1, 10000, 'APR'),
(2, 800, 'APR'),
(2, 4500, 'APR');
SELECT *
FROM monthly_sales
PIVOT(SUM(amount)
FOR month IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p
ORDER BY empid;
SELECT LISTAGG( DISTINCT ''''||month||'''', ', ' )
FROM monthly_sales;

SQL Joins with NOT IN displays incorrect data

I have 3 tables as below, and I need data where Expense.Expense_Code Should not be availalbe in Income.Income_Code.
Table: Base
+----+-----------+----------------+
| ID | Reference | Reference_Name |
+----+-----------+----------------+
| 1 | 10000 | AAAA |
| 2 | 10001 | BBBB |
| 3 | 10002 | CCCC |
+----+-----------+----------------+
Table: Expense
+-----+---------+--------------+----------------+
| EID | BASE_ID | Expense_Code | Expense_Amount |
+-----+---------+--------------+----------------+
| 1 | 1 | I0001 | 25 |
| 2 | 1 | I0002 | 50 |
| 3 | 2 | I0003 | 75 |
+-----+---------+--------------+----------------+
Table: Income
+------+---------+-------------+------------+
| I_ID | BASE_ID | Income_Code | Income_Amt |
+------+---------+-------------+------------+
| 1 | 1 | I0001 | 10 |
| 2 | 1 | I0002 | 20 |
| 3 | 1 | I0003 | 30 |
+------+---------+-------------+------------+
SELECT DISTINCT Base.Reference,Expense.Expense_Code
FROM Base
JOIN Expense ON Base.ID = Expense.BASE_ID
JOIN Income ON Base.ID = Income.BASE_ID
WHERE Expense.Expense_Code IN ('I0001','I0002')
AND Income.Income _CODE NOT IN ('I0001','I0002')
I expect no data be retured.
However I am getting the result as below:
+-----------+--------------+
| REFERENCE | Expense_Code |
+-----------+--------------+
| 10000 | I0001 |
| 10000 | I0002 |
+-----------+--------------+
For Base.Reference (10000), Expense.Expense_Code='I0001','I0002' the same expense_code is availalbe in Income table therefore I should not get any data.
Am I trying to do something wrong with the joins.
Thanks in advance for your help!
You are not joining EXPENSE and INCOME tables in your query at all. There needs to be a condition to join these tables in order to get desired result. You can also use NOT EXISTS clause. Prefer using NOT EXISTS over NOT IN as it performs better in case there are NULLS allowed in the columns that you're joining on.
SELECT * FROM BASE B
JOIN EXPENSE E ON B.ID=E.BASE_ID
WHERE E.EXPENSE_CODE NOT EXISTS (SELECT I.INCOME_CODE FROM INCOME I WHERE I.I_ID=E.EID)
When the first join is performed, you end with two lines possessing the ID 1, because the relationship between the tables is not 1o1, hence every line of the first table will have joined to it a line coming from the second table. Like so:
Output of the first join statement
Then, when the second part of your statement is executed, the DBMS finds two ID's 1 from the first joined table(BASE+EXPENSE) and 3 from the third table(INCOME).
Again since it's non a 1o1 relationship between tables, every row from the first joined table will have a joined line coming from the second table, like so: Output of the second join statement
Finally, when it reads your where clause and outputs what you see. I highlighted the excluded rows from the where clause
Output of where statement
...I need data where Expense.Expense_Code Should not be availalbe in Income.Income_Code
The following query will retrieve this data:
select b.*, e.*
from base b
join expense e on e.base_id = b.id
left join income i on i.base_id = e.base_id
and e.expense_code = i.income_code
where i.i_id is null
For reference the data script (slightly modified) is:
create table base (
id number(6),
reference number(6),
reference_name varchar2(10)
);
insert into base (id, reference, reference_name) values (1, 10000, 'AAAA');
insert into base (id, reference, reference_name) values (2, 10001, 'BBBB');
insert into base (id, reference, reference_name) values (3, 10002, 'CCCC');
create table expense (
eid number(6),
base_id number(6),
expense_code varchar2(10),
expense_amount number(6)
);
insert into expense (eid, base_id, expense_code, expense_amount) values (1, 1, 'I0001', 25);
insert into expense (eid, base_id, expense_code, expense_amount) values (2, 1, 'I0002', 50);
insert into expense (eid, base_id, expense_code, expense_amount) values (3, 1, 'I0003', 75);
insert into expense (eid, base_id, expense_code, expense_amount) values (4, 2, 'I0004', 101);
create table income (
i_id number(6),
base_id number(6),
income_code varchar2(10),
income_amt number(6)
);
insert into income (i_id, base_id, income_code, income_amt) values (1, 1, 'I0001', 10);
insert into income (i_id, base_id, income_code, income_amt) values (2, 1, 'I0002', 20);
insert into income (i_id, base_id, income_code, income_amt) values (3, 1, 'I0003', 30);
Result:
ID REFERENCE REFERENCE_NAME EID BASE_ID EXPENSE_CODE EXPENSE_AMOUNT
-- --------- -------------- --- ------- ------------ --------------
2 10,001 BBBB 4 2 I0004 101

SELECT check the colum of the max row

Here my row with my first select:
SELECT
user.id, analytic_youtube_demographic.age,
analytic_youtube_demographic.percent
FROM
`user`
INNER JOIN
analytic ON analytic.user_id = user.id
INNER JOIN
analytic_youtube_demographic ON analytic_youtube_demographic.analytic_id = analytic.id
Result:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |13-17| 19,6 |
| 1 |18-24| 38.4 |
| 1 |25-34| 22.5 |
| 1 |35-44| 11.5 |
| 1 |45-54| 5.3 |
| 1 |55-64| 1.6 |
| 1 |65+ | 1.2 |
| 2 |13-17| 10 |
| 2 |18-24| 10 |
| 2 |25-34| 25 |
| 2 |35-44| 5 |
| 2 |45-54| 25 |
| 2 |55-64| 5 |
| 1 |65+ | 20 |
---------------------------
The max value by user_id:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |18-24| 38.4 |
| 2 |45-54| 25 |
| 2 |25-34| 25 |
---------------------------
And I need to filter Age in ['25-34', '65+']
I must have at the end :
-----------
| id |
|----------
| 2 |
-----------
Thanks a lot for your help.
Have tried to use MAX(analytic_youtube_demographic.percent). But I don't know how to filter with the age too.
Thanks a lot for your help.
You can use the rank() function to identify the largest percentage values within each user's data set, and then a simple WHERE clause to get those entries that are both of the highest rank and belong to one of the specific demographics you're interested in. Since you can't use windowed functions like rank() in a WHERE clause, this is a two-step process with a subquery or a CTE. Something like this ought to do it:
-- Sample data from the question:
create table [user] (id bigint);
insert [user] values
(1), (2);
create table analytic (id bigint, [user_id] bigint);
insert analytic values
(1, 1), (2, 2);
create table analytic_youtube_demographic (analytic_id bigint, age varchar(32), [percent] decimal(5, 2));
insert analytic_youtube_demographic values
(1, '13-17', 19.6),
(1, '18-24', 38.4),
(1, '25-34', 22.5),
(1, '35-44', 11.5),
(1, '45-54', 5.3),
(1, '55-64', 1.6),
(1, '65+', 1.2),
(2, '13-17', 10),
(2, '18-24', 10),
(2, '25-34', 25),
(2, '35-44', 5),
(2, '45-54', 25),
(2, '55-64', 5),
(2, '65+', 20);
-- First, within the set of records for each user.id, use the rank() function to
-- identify the demographics with the highest percentage.
with RankedDataCTE as
(
select
[user].id,
youtube.age,
youtube.[percent],
[rank] = rank() over (partition by [user].id order by youtube.[percent] desc)
from
[user]
inner join analytic on analytic.[user_id] = [user].id
inner join analytic_youtube_demographic youtube on youtube.analytic_id = analytic.id
)
-- Now select only those records that are (a) of the highest rank within their
-- user.id and (b) either the '25-34' or the '65+' age group.
select
id,
age,
[percent]
from
RankedDataCTE
where
[rank] = 1 and
age in ('25-34', '65+');

Count Based on Columns in SQL Server

I have 3 tables:
SELECT id, letter
FROM As
+--------+--------+
| id | letter |
+--------+--------+
| 1 | A |
| 2 | B |
+--------+--------+
SELECT id, letter
FROM Xs
+--------+------------+
| id | letter |
+--------+------------+
| 1 | X |
| 2 | Y |
| 3 | Z |
+--------+------------+
SELECT id, As_id, Xs_id
FROM A_X
+--------+-------+-------+
| id | As_id | Xs_id |
+--------+-------+-------+
| 9 | 1 | 1 |
| 10 | 1 | 2 |
| 11 | 2 | 3 |
| 12 | 1 | 2 |
| 13 | 2 | 3 |
| 14 | 1 | 1 |
+--------+-------+-------+
I can count all As and Bs with group by. But I want to count As and Bs based on X,Y and Z. What I want to get is below:
+-------+
| X,Y,Z |
+-------+
| 2,2,0 |
| 0,0,2 |
+-------+
X,Y,Z
A 2,2,0
B 0,0,2
What is the best way to do this at MSSQL? Is it an efficent way to use foreach for example?
edit: It is not a duplicate because I just wanted to know the efficent way not any way.
For what you're trying to do without knowing what is inefficient with your current code (because none was provided), a Pivot is best. There are a million resources online and here in the stack overflow Q/A forums to find what you need. This is probably the simplest explanation of a Pivot which I frequently need to remind myself of the complicated syntax of a pivot.
To specifically answer your question, this is the code that shows how the link above applies to your question
First Tables needed to be created
DECLARE #AS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XA AS TABLE (ID INT, AsID INT, XsID INT)
Values were added to the tables
INSERT INTO #AS (ID, Letter)
SELECT 1,'A'
UNION
SELECT 2,'B'
INSERT INTO #XS (ID, Letter)
SELECT 1,'X'
UNION
SELECT 2,'Y'
UNION
SELECT 3,'Z'
INSERT INTO #XA (ID, ASID, XSID)
SELECT 9,1,1
UNION
SELECT 10,1,2
UNION
SELECT 11,2,3
UNION
SELECT 12,1,2
UNION
SELECT 13,2,3
UNION
SELECT 14,1,1
Then the query which does the pivot is constructed:
SELECT LetterA, [X],[Y],[Z]
FROM (SELECT A.LETTER AS LetterA
,B.LETTER AS LetterX
,C.ID
FROM #XA C
JOIN #AS A
ON A.ID = C.ASID
JOIN #XS B
ON B.ID = C.XSID
) Src
PIVOT (COUNT(ID)
FOR LetterX IN ([X],[Y],[Z])
) AS PVT
When executed, your results are as follows:
Letter X Y Z
A 2 2 0
B 0 0 2
As i said in comment ... just join and do simple pivot
if object_id('tempdb..#AAs') is not null drop table #AAs
create table #AAs(id int, letter nvarchar(5))
if object_id('tempdb..#XXs') is not null drop table #XXs
create table #XXs(id int, letter nvarchar(5))
if object_id('tempdb..#A_X') is not null drop table #A_X
create table #A_X(id int, AAs int, XXs int)
insert into #AAs (id, letter) values (1, 'A'), (2, 'B')
insert into #XXs (id, letter) values (1, 'X'), (2, 'Y'), (3, 'Z')
insert into #A_X (id, AAs, XXs)
values (9, 1, 1),
(10, 1, 2),
(11, 2, 3),
(12, 1, 2),
(13, 2, 3),
(14, 1, 1)
select LetterA,
ISNULL([X], 0) [X],
ISNULL([Y], 0) [Y],
ISNULL([Z], 0) [Z]
from (
select distinct a.letter [LetterA], x.letter [LetterX],
count(*) over (partition by a.letter, x.letter order by a.letter) [Counted]
from #A_X ax
join #AAs A on ax.AAs = A.ID
join #XXs X on ax.XXs = X.ID
)src
PIVOT
(
MAX ([Counted]) for LetterX in ([X], [Y], [Z])
) piv
You get result as you asked for
LetterA X Y Z
A 2 2 0
B 0 0 2

Get id of max value in group

I have a table and i would like to gather the id of the items from each group with the max value on a column but i have a problem.
SELECT group_id, MAX(time)
FROM mytable
GROUP BY group_id
This way i get the correct rows but i need the id:
SELECT id,group_id,MAX(time)
FROM mytable
GROUP BY id,group_id
This way i got all the rows. How could i achieve to get the ID of max value row for time from each group?
Sample Data
id = 1, group_id = 1, time = 2014.01.03
id = 2, group_id = 1, time = 2014.01.04
id = 3, group_id = 2, time = 2014.01.04
id = 4, group_id = 2, time = 2014.01.02
id = 5, group_id = 3, time = 2014.01.01
and from that i should get id: 2,3,5
Thanks!
Use your working query as a sub-query, like this:
SELECT `id`
FROM `mytable`
WHERE (`group_id`, `time`) IN (
SELECT `group_id`, MAX(`time`) as `time`
FROM `mytable`
GROUP BY `group_id`
)
Have a look at the below demo
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable(id INT , group_id INT , time_st DATE);
INSERT INTO mytable VALUES(1, 1, '2014-01-03'),(2, 1, '2014-01-04'),(3, 2, '2014-01-04'),(4, 2, '2014-01-02'),(5, 3, '2014-01-01');
/** Check all data **/
SELECT * FROM mytable;
+------+----------+------------+
| id | group_id | time_st |
+------+----------+------------+
| 1 | 1 | 2014-01-03 |
| 2 | 1 | 2014-01-04 |
| 3 | 2 | 2014-01-04 |
| 4 | 2 | 2014-01-02 |
| 5 | 3 | 2014-01-01 |
+------+----------+------------+
/** Query for Actual output**/
SELECT
id
FROM
mytable
JOIN
(
SELECT group_id, MAX(time_st) as max_time
FROM mytable GROUP BY group_id
) max_time_table
ON mytable.group_id = max_time_table.group_id AND mytable.time_st = max_time_table.max_time;
+------+
| id |
+------+
| 2 |
| 3 |
| 5 |
+------+
When multiple groups may contain the same value, you could use
SELECT subq.id
FROM (SELECT id,
value,
MAX(time) OVER (PARTITION BY group_id) as max_time
FROM mytable) as subq
WHERE subq.time = subq.max_time
The subquery here generates a new column (max_time) that contains the maximum time per group. We can then filter on time and max_time being identical. Note that this still returns multiple rows per group if the maximum value occurs multiple time within the same group.
Full example:
CREATE TABLE test (
id INT,
group_id INT,
value INT
);
INSERT INTO test (id, group_id, value) VALUES (1, 1, 100);
INSERT INTO test (id, group_id, value) VALUES (2, 1, 200);
INSERT INTO test (id, group_id, value) VALUES (3, 1, 300);
INSERT INTO test (id, group_id, value) VALUES (4, 2, 100);
INSERT INTO test (id, group_id, value) VALUES (5, 2, 300);
INSERT INTO test (id, group_id, value) VALUES (6, 2, 200);
INSERT INTO test (id, group_id, value) VALUES (7, 3, 300);
INSERT INTO test (id, group_id, value) VALUES (8, 3, 200);
INSERT INTO test (id, group_id, value) VALUES (9, 3, 100);
select * from test;
id | group_id | value
----+----------+-------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 300
4 | 2 | 100
5 | 2 | 300
6 | 2 | 200
7 | 3 | 300
8 | 3 | 200
9 | 3 | 100
(9 rows)
SELECT subq.id
FROM (SELECT id,
value,
MAX(value) OVER (partition by group_id) as max_value
FROM test) as subq
WHERE subq.value = subq.max_value;
id
----
3
5
7
(3 rows)