Please help me, I need to find out a SQL solution for grouping data using SQL Server database.
I'm pretty sure that it could be done in one SQL request but I can't see the trick.
Let' see the problem :
I have a two columns table (please see below an example). I just want to add a new column containing a number or a string which indicates the group
BEFORE :
Col1 | Col2
-----+-----
A | B
B | C
D | E
F | G
G | H
I | I
J | U
AFTER TRANSFORMATION :
Col1 | Col2 | Group
-----+------+------
A | B | 1
B | C | 1
D | E | 2
F | G | 3
G | H | 3
I | I | 4
J | U | 5
In other words: A, B, C are in the same group; D and E too; F, G, H in group 3 ....
Do you have any lookup table to get this group mapping?
Or if you just have a logic defined to decide a group, i would recommend to add a UDF which will return group for supplied values.
SELECT Col1,Col2,GetGroupID(Col1,Col2) AS Group
FROM Table
Your UDF will be something like following
CREATE FUNCTION GetGroupID
(
-- Add the parameters for the function here
#Col1 varchar(10),
#Col2 varchar(10)
)
RETURNS int
AS
BEGIN
DECLARE #groupID int
IF (#Col1="A" AND #Co2 = "B") OR (#Col1="B" AND #Co2 = "C")
BEGIN
SET #groupID = 1
END
IF #Col1="D" AND #Co2 = "E"
BEGIN
SET #groupID = 2
END
-- You can write saveral conditions in the same manner.
return #groupID
END
However, in case you have this mapping defined somewhere in another table, let us know the structure of the table and we can then update the query to join with that table instead of using UDF.
Considering the performance of the query, if the amount of data is huge in your table , it is recommended to have these mappings to one fix table and Join that table in query. Using UDF may harm performance if data amount is huge.
There is absolutely no need for a UDF here. Regardless of whether you are looking to update the table with a new column or simply pull out the data with the grouping applied, you will be best off using a set based solution, ie: create and join to a table.
I am assuming here that you don't have messy data, such as a row with Col1 = 'A' and Col2 = 'F'.
If you are able to add new tables permanently you can use the following to create your lookup table:
create table Col1Groups(Col1 nvarchar(10), GroupNum int);
insert into Col1Groups(Col1,GroupNum) values ('A',1),('B',1),('C',1),('D',2),('E',2),('F',3),('G',3),('H',3);
and then join to it:
select t.Col1
,t.Col2
,g.GroupNum
from Table t
inner join Col1Groups g
on t.Col1 = g.Col1
If you can't, you can just create a derived table via a CTE:
with Col1Groups as
(
select Col1
,GroupNum
from (values('A',1),('B',1),('C',1),('D',2),('E',2),('F',3),('G',3),('H',3)) as x(Col1,GroupNum)
)
select t.Col1
,t.Col2
,g.GroupNum
from Table t
inner join Col1Groups g
on t.Col1 = g.Col1
You get the first rows per group with
select col1, col2 from mytable where col1 not in (select col2 from mytable) or col1 = col2;
We can give these rows numbers with
rank() over (order by col1) as grp
Now we must iterate through the rows to find the ones belonging to those first ones, then those belonging to these, etc. A recursive query.
with cte(col1, col2, grp) as
(
select col1, col2, rank() over (order by col1) as grp
from mytable where col1 not in (select col2 from mytable) or col1 = col2
union all
select mytable.col1, mytable.col2, cte.grp
from cte
join mytable on mytable.col1 = cte.col2
where mytable.col1 <> mytable.col2
)
select * from cte
order by grp, col1;
Additional answer for a more flexible approach
Originally you asked for chains A|B -> B|C, F|G -> G|H etc., but in your comment to my other answer you introduced forks like A|B -> B|C, B|D and I've adjusted my answer.
If you want to go one step further and introduce net-like relations such as A|B -> B|C, D|C, we can no longer follow chains forward only (in the example D belongs to the A group, because though A doesn't lead to D directly, it leads to C and D also leads to C. Here is a way to solve this:
Get all letters from the table (no matter whether in col1 or col2). Then for each of them find related letters (again no matter whether in col1 or col2). And for these again find related letters and so on. That will give you complete groups. But duplicates (as D is in the A group, A is in the D group also), which you can get rid of by simply taking the smallest (or greatest) group key per letter. Then join the Groups to the table.
The query:
with cte(col, grp) as
(
select col, rownum as grp from
(select col1 as col from mytable union select col2 from mytable)
union all
select case when mytable.col1 = cte.col then mytable.col2 else mytable.col1 end, cte.grp
from cte
join mytable on cte.col in (mytable.col1, mytable.col2)
where mytable.col1 <> mytable.col2
)
cycle col set is_cycle to 'y' default 'n'
select mytable.col1, mytable.col2, x.grp
from mytable
join (select col, min(grp) as grp from cte group by col) x on x.col = mytable.col1
order by grp, col;
Related
I have the following table tbl:
column1 | column2 | column 3
-----------------------------------
1 | 'value1' | 3
2 | 'value2' | 4
How to do "pivot" with column names to produce output like:
column1 | 1 | 2
column2 | 'value1' |'value2'
column3 | 3 | 4
As has been commented, the issue of data types is undefined in the question.
If you are OK with all result columns being type text (every data type can be converted to text), you can use one of these:
Plain SQL
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
(1, t.column1::text)
, (2, t.column2)
, (3, t.column3::text)
) nu(rn, c)
)
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn);
The same with useful column names:
WITH cte AS (
SELECT nu.*
FROM tbl t
, LATERAL (
VALUES
('column1', t.column1::text)
, ('column2', t.column2)
, ('column3', t.column3::text)
) nu(rn, c)
)
SELECT * FROM (
SELECT *
FROM (TABLE cte OFFSET 0 LIMIT 3) c1
JOIN (TABLE cte OFFSET 3 LIMIT 3) c2 USING (rn)
) t (key, row1, row2);
Works in any modern version of Postgres.
The SQL string has to be adapted to the number of rows and columns. See fiddles below!
Using a document type as stepping stone
Makes for shorter code.
With many rows and many columns, performance of the SQL solution may scale better because the intermediate derived table is smaller.
(The thread is limited as you can't have more than ~ 1600 table columns in Postgres.)
Since everything is converted to text anyway, hstore seems most efficient. See:
Key value pair in PostgreSQL
SELECT key
, arr[1] AS row1
, arr[2] AS row2
FROM (
SELECT x.key, array_agg(x.value) AS arr
FROM tbl t, each(hstore(t)) x
GROUP BY 1
) sub
ORDER BY 1;
Technically speaking we would have to enforce the right sort order when in array_agg(), but that should work without explicit ORDER BY. To be absolutely sure you can add one: array_agg(x.value ORDER BY t.ctid) Using ctid for lack of information.
You can do the same with JSON functions in (Postgres 9.3+). Just replace each(hstore(t) with json_each_text(row_to_json(t). The rest is identical.
These fiddles demonstrate how to scale each query:
Original example with 2 rows of 3 columns:
db<>fiddle here
Scaled up to 3 rows of 4 columns:
db<>fiddle here
I've a table like this
ID col1 col2 col3 col4
---------------------------------------------------
1 a b c d
2 e f g h
So if I pass the ID 2 it should return all the colum values as separate rows as this
Colum Value
---------------------
ID 2
Col1 e
col2 f
col3 g
col4 h
So all the cells of that single rows been splitted as separate rows.
How can I accomplish this
One way to do it with unpivot and union all.
select 'id' as colu,cast(id as varchar(255)) as val from t where id=2
union all
select colu,val
from t
unpivot (val for colu in (col1,col2,col3,col4)) u
where id=2
You can use cross apply as below:
Select Id as [Column], [Value] from yourcols
cross apply (values(col1), (col2), (col3), (col4)) rws([Value])
where Id = 2
Please refer the below stack over flow link SQL query to split column data into rows
found the question identical as yours.
I have two tables table which are identical in structure but belong to different schemas (schemas A and B). All rows in question will always appear in the A.table but may or may not appear in B.table. B.table is essentially an override for the defaults in A.table.
As such my query uses a COALESCE on each field similar to:
SELECT COALESCE(B.id, A.id) as id,
COALESCE(B.foo, A.foo) as foo,
COALESCE(B.bar, A.bar) as bar
FROM A.table LEFT JOIN B.table ON (A.id = B.id)
WHERE A.id in (1, 2, 3)
This works great, but I also want to add the source of the data. In the example above, assuming id=2 existed in B.table but not 1 or 3, I would want to include some indication that A is the source for 1 and 3 and B is the source for 2.
So the data might look like the following
+---------------------------------+
| id | foo | bar | source |
+---------------------------------+
| 1 | a | b | A |
| 2 | c | d | B |
| 3 | e | f | A |
+---------------------------------+
I don't really care what the value of source is as long as I can distinguish A from B.
I am no pgsql expert (not by a long shot) but I have tinkered around with EXISTS and a subquery but have had no luck so far.
As records showing the default value (from A.table) have NULLs for B.id, all you need is to add this column specification to your query:
CASE WHEN B.id IS NULL THEN 'A' ELSE 'B' END AS Source
The USING clause would simplify the query you have:
SELECT id
, COALESCE(B.foo, A.foo) AS foo
, COALESCE(B.bar, A.bar) AS bar
, CASE WHEN b.id IS NULL THEN 'A' ELSE 'B' END AS source -- like #Terje provided
FROM a
LEFT JOIN b USING (id)
WHERE a.id IN (1, 2, 3);
But typically, this alternative query should serve you better:
SELECT x.* -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT *, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT *, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
Advantages:
You don't have to add another COALESCE construct for every column you want to add to the result.
The same query works for any number of columns in a and b.
The query even works if the column names are not identical. Only number and data types of columns must match.
Of course, you can always list selected, compatible columns as well:
SELECT * -- or list columns of your choice
FROM (VALUES (1), (2), (3)) t (id)
, LATERAL (
SELECT foo, bar, 'B' AS source FROM b WHERE id = t.id
UNION ALL
SELECT foo2, bar17, 'A' FROM a WHERE id = t.id
LIMIT 1
) x
ORDER BY x.id;
The first SELECT determines names, data types and number of columns.
This query doesn't break if columns in b are not defined NOT NULL.
COALESCE cannot tell the difference between b.foo IS NULL and no row with matching id in b. So the source of any result column (except id) can still be 'A', even if the result row says 'B' - if any relevant column in b can be NULL.
My alternative returns all values from b if the row exists - including NULL values. So the result can be different if columns in b can be NULL. It depends on your requirements which behavior is desirable.
Either query assumes that id is defined as primary key (so exactly 1 or 0 rows per given id value).
Related:
Select first record if none match
What is the difference between LATERAL and a subquery in PostgreSQL?
I have following structure of data in Oracle table
COL1 COL2 COL2 GRP_ID
A A B 1
A A B 1
A A C 2
A A B 1
A D E 3
A D E 3
F G H 4
F G H 4
Basically each unique combination of col1, col2 and col3 has different value in GRP_ID column.
I need to replace value in GRP_ID column with database sequence value such that (assuming next value of sequence is 235678):
COL1 COL2 COL2 GRP_ID
A A B 235678
A A B 235678
A A C 235679
A A B 235678
A D E 235680
A D E 235680
F G H 235681
F G H 235681
There are millions of records in the table so I do not want to go through a loop. Reason to use database sequence is that the number provided by sequence will be exposed to customer and therefore it should not repeat when next communication is sent to customer.
Is there a way to do this through SQL?
Thanks!
Don't know if is the fastest way but it works.
CREATE FUNCTION NEXT
RETURN NUMBER IS
v_nextval NUMBER;
BEGIN
v_nextval := NEW_SEQUENCE.nextval;
RETURN(v_nextval);
END;
/
UPDATE EXAMPLE
SET EXAMPLE.GroupID =
(
SELECT G.GroupID FROM
(
SELECT B.Column1, B.Column2, B.Column3, MY_SCHEMA.NEXT() AS GroupID
FROM EXAMPLE B
GROUP BY B.Column1, B.Column2, B.Column3
) G
Where G.Column1 = EXAMPLE.Column1 AND G.Column2 = EXAMPLE.Column2
AND G.Column3 = EXAMPLE.Column3);
SELECT *
FROM EXAMPLE
Basically you have to do a distinct or group by to get all the different groups that you have and use the sequence in a function or you will get the sequence number is not allowed here error from oracle and then do an update.
See complete example on sqlfiddle
http://sqlfiddle.com/#!4/bf261/8
How about:
begin transaction
Get next sequence in a temp var
Disable sequence
update tablename set grp_ID = grp_ID + tmpVar-1
Select into tempVar max(grp_ID) from tableName
Re-seed sequence based value now in tempvar+1
Re-enable sequence
commit and end transaction
But i'm confused by something when you add additional records later, they can't simply do sequence.nextval because it's only needed when col1,col2,col3 doesn't already exist, otherwise it should use the group_ID for that unique combination... just seems odd, doable but odd.
I have two queries :
Queries Simplified excluding Joins
Query 1 : select ProductName,NumberofProducts (in inventory) from Table1.....;
Query 2 : select ProductName, NumberofProductssold from Table2......;
I would like to know how I can get an output as :
ProductName NumberofProducts(in inventory) ProductName NumberofProductsSold
The relationships used for getting the outputs for each query are different.
I need the output this way for my SSRS report .
(I tried the union statement but it doesnt work for the output I want to see. )
Here is an example that does a union between two completely unrelated tables: the Student and the Products table. It generates an output that is 4 columns:
select
FirstName as Column1,
LastName as Column2,
email as Column3,
null as Column4
from
Student
union
select
ProductName as Column1,
QuantityPerUnit as Column2,
null as Column3,
UnitsInStock as Column4
from
Products
Obviously you'll tweak this for your own environment...
I think you are after something like this; (Using row_number() with CTE and performing a FULL OUTER JOIN )
Fiddle example
;with t1 as (
select col1,col2, row_number() over (order by col1) rn
from table1
),
t2 as (
select col3,col4, row_number() over (order by col3) rn
from table2
)
select col1,col2,col3,col4
from t1 full outer join t2 on t1.rn = t2.rn
Tables and data :
create table table1 (col1 int, col2 int)
create table table2 (col3 int, col4 int)
insert into table1 values
(1,2),(3,4)
insert into table2 values
(10,11),(30,40),(50,60)
Results :
| COL1 | COL2 | COL3 | COL4 |
---------------------------------
| 1 | 2 | 10 | 11 |
| 3 | 4 | 30 | 40 |
| (null) | (null) | 50 | 60 |
How about,
select
col1,
col2,
null col3,
null col4
from Table1
union all
select
null col1,
null col2,
col4 col3,
col5 col4
from Table2;
The problem is that unless your tables are related you can't determine how to join them, so you'd have to arbitrarily join them, resulting in a cartesian product:
select Table1.col1, Table1.col2, Table2.col3, Table2.col4
from Table1
cross join Table2
If you had, for example, the following data:
col1 col2
a 1
b 2
col3 col4
y 98
z 99
You would end up with the following:
col1 col2 col3 col4
a 1 y 98
a 1 z 99
b 2 y 98
b 2 z 99
Is this what you're looking for? If not, and you have some means of relating the tables, then you'd need to include that in joining the two tables together, e.g.:
select Table1.col1, Table1.col2, Table2.col3, Table2.col4
from Table1
inner join Table2
on Table1.JoiningField = Table2.JoiningField
That would pull things together for you into however the data is related, giving you your result.
If you mean that both ProductName fields are to have the same value, then:
SELECT a.ProductName,a.NumberofProducts,b.ProductName,b.NumberofProductsSold FROM Table1 a, Table2 b WHERE a.ProductName=b.ProductName;
Or, if you want the ProductName column to be displayed only once,
SELECT a.ProductName,a.NumberofProducts,b.NumberofProductsSold FROM Table1 a, Table2 b WHERE a.ProductName=b.ProductName;
Otherwise,if any row of Table1 can be associated with any row from Table2 (even though I really wonder why anyone'd want to do that), you could give this a look.
Old question, but where others use JOIN to combine unrelated queries to rows in one table, this is my solution to combine unrelated queries to one row, e.g:
select
(select count(*) c from v$session where program = 'w3wp.exe') w3wp,
(select count(*) c from v$session) total,
sysdate
from dual;
which gives the following one-row output:
W3WP TOTAL SYSDATE
----- ----- -------------------
14 290 2020/02/18 10:45:07
(which tells me that our web server currently uses 14 Oracle sessions out of the total of 290 sessions; I log this output without headers in an sqlplus script that runs every so many minutes)
Load each query into a datatable:
http://www.dotnetcurry.com/ShowArticle.aspx?ID=143
load both datatables into the dataset:
http://msdn.microsoft.com/en-us/library/aeskbwf7%28v=vs.80%29.aspx
This is what you can do. Assuming that your ProductName column have common values.
SELECT
Table1.ProductName,
Table1.NumberofProducts,
Table2.ProductName,
Table2.NumberofProductssold
FROM Table1
INNER JOIN Table2
ON Table1.ProductName= Table2.ProductName
Try this:
SELECT ProductName,NumberofProducts ,NumberofProductssold
FROM table1
JOIN table2
ON table1.ProductName = table2.ProductName
Try this:
GET THE RECORD FOR CURRENT_MONTH, LAST_MONTH AND ALL_TIME AND MERGE THEM INTO SINGLE ARRAY
$analyticsData = $this->user->getMemberInfoCurrentMonth($userId);
$analyticsData1 = $this->user->getMemberInfoLastMonth($userId);
$analyticsData2 = $this->user->getMemberInfAllTime($userId);
foreach ($analyticsData2 as $arr) {
foreach ($analyticsData1 as $arr1) {
if ($arr->fullname == $arr1->fullname) {
$arr->last_send_count = $arr1->last_send_count;
break;
}else{
$arr->last_send_count = 0;
}
}
foreach ($analyticsData as $arr2) {
if ($arr->fullname == $arr2->fullname) {
$arr->current_send_count = $arr2->current_send_count;
break;
}else{
$arr->current_send_count = 0;
}
}
}
echo "<pre>";
print_r($analyticsData2);die;