I have a table that contains a line entry for each item that I need to group into an output with all the values of multiple rows that have the same uniqueID into one column result. The ItemType will be different but the UniqueID and OtherData shouldn't change as it's tied directly to the UniqueID, but I'm not 100% sure as there are well over 2M lines in this DB. I have seen similar questions and they would appear to do what I would like them to do, but the answers are over simplified and usually only include the unique id and the field they want on one line. I need to include about 5 other columns but I don't need to any anything fancy to them. Just trying to group results from the one column as well as return the other columns (that are likely not never be different).
To over simplify the data set this is what it looks like followed by what I'd like to do.
Example Table:
UniqueID | ItemType | OtherData
----------------------------------
1234 | apples | 123.1.123.1
1234 | oranges | 123.1.123.1
2233 | red fish | 123.5.67.2
1234 | grapes | 123.1.123.1
2233 | blue fish | 123.5.67.2
Desired Result:
UniqueID | ItemType | OtherData
----------------------------------
1234 | apples, oranges, grapes | 123.1.123.1
2233 | red fish, blue fish | 123.5.67.2
I've tried a couple versions of SELECT DISTINCT and GROUP BY but that either returns the same as if I didn't or some other undesirable result. Also tried STRING_AGG but that only works on MSSQL2017.
Any help you can provide would be much appreciated, thank you!
Building on the answer from the previous link you can create a cte then execute the query
This will given you the
SELECT Main.UniqueID,
LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
(
SELECT DISTINCT ST2.UniqueID,
(
SELECT ST1.ItemType + ',' AS [text()]
FROM dbo.TheTable ST1
WHERE ST1.UniqueID = ST2.UniqueID
ORDER BY ST1.UniqueID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') ItemTypes
FROM dbo.TheTable ST2
) [Main]
Once you have that you can build this into a cte with the with statement then join back on the table to get the rest of the data.
with ItemTypes as
(
SELECT Main.UniqueID,
LEFT(Main.ItemTypes,Len(Main.ItemTypes)-1) As "ItemTypes"
FROM
(
SELECT DISTINCT ST2.UniqueID,
(
SELECT ST1.ItemType + ',' AS [text()]
FROM dbo.TheTable ST1
WHERE ST1.UniqueID = ST2.UniqueID
ORDER BY ST1.UniqueID
FOR XML PATH (''), TYPE
).value('text()[1]','nvarchar(max)') ItemTypes
FROM dbo.TheTable ST2
) [Main]
)
Select Distinct TheTable.UniqueID, ItemTypes.ItemTypes, TheTable.OtherData
from TheTable join ItemTypes
on (TheTable.UniqueID = ItemTypes.UniqueID)
Results
UniqueID ItemTypes OtherData
--------- -------------------------- --------------------------------
1234 apples,oranges,grapes OtherData
2233 red fish,blue fish OtherData
There are a few expensive operations this will be an expensive query to run. but with 2million rows should be ok with a good server.
What you need is a group concat query in MySQL.
Your query will be
select UniqueID, GROUP_CONCAT(ItemType) ItemTypes, GROUP_CONCAT( DISTINCT OtherData) Otherdata from ExampleTable GROUP BY UniqueID;
A sample reference you can check here:
What group concat does is that it will group all column values against that UniqID col and mark them in a comma separated format. Link to read more in depth about group concat can be found here: Link
Group by all columns except for the item type. Then use the string aggregation function on the item type. In SQL Server this is STRING_AGG, but as you are using an antique version of the DBMS, you will have to emulate it. I am showing the proper query with STRING_AGGhere. Use Google or stackoverflow to see how it is emulated in your old SQL Server version.
select
uniqueid,
string_agg(itemtype, ', ') as itemtypes
otherdata
from mytable
group by uniqueid, otherdata
order by uniqueid, otherdata;
Two remarks:
UniqueID is a funny name for an ID that is not unique but occurs multiple times in the table. Misnomers like this can lead to errors and bad maintainability.
As to "OtherData shouldn't change as it's tied directly to the UniqueID": This indicates that your data model is flawed and your database table is not normalized. You should have two tables instead, one for the unique data, one for the details.
Related
Suppose there is a table which has several identical rows. I can copy the distinct values by
SELECT DISTINCT * INTO DESTINATIONTABLE FROM SOURCETABLE
but if the table has a column named value and for the sake of simplicity its value is 1 for one particular item in that table. Now that row has another 9 duplicates. So the summation of the value column for that particular item is 10. Now I want to remove the 9 duplicates(or copy the distinct value as I mentioned) and for that item now the value should show 10 and not 1. How can this be achieved?
item| value
----+----------------
A | 1
A | 1
A | 1
A | 1
B | 1
B | 1
I want to show this as below
item| value
----+----------------
A | 4
B | 2
Thanks in advance
You can try to use SUM and group by
SELECT item,SUM(value) value
FROM T
GROUP BY item
SQLfiddle:http://sqlfiddle.com/#!18/fac26/1
[Results]:
| item | value |
|------|-------|
| A | 4 |
| B | 2 |
Broadly speaking, you can just us a sum and a GROUP BY clause.
Something like:
SELECT column1, SUM(column2) AS Count
FROM SOURCETABLE
GROUP BY column1
Here it is in action: Sum + Group By
Since your table probably isn't just two columns of data, here is a slightly more complex example showing how to do this to a larger table: SQL Fiddle
Note that I've selected my rows individually so that I can access the necessary data, rather than using
SELECT *
And I have achieved this result without the need for selecting data into another table.
EDIT 2:
Further to your comments, it sounds like you want to alter the actual data in your table rather than just querying it. There may be a more elegant way to do this, but a simple way use the above query to populate a temporary table, delete the contents of the existing table, then move all the data back. To do this in my existing example:
WITH MyQuery AS (
SELECT name, type, colour, price, SUM(number) AS number
FROM MyTable
GROUP BY name, type, colour, price
)
SELECT * INTO MyTable2 FROM MyQuery;
DELETE FROM MyTable;
INSERT INTO MyTable(name, type, colour, price, number)
SELECT * FROM MyTable2;
DROP TABLE MyTable2;
WARNING: If youre going to try this, please use a development environment first (i.e one you don't mind breaking!) to ensure it does exactly what you want it to do. It's imperative that your initial query captures ALL the data you want.
Here is the SQL Fiddle of this example in action: SQL Fiddle
I've been trying to update a query on the DataExplorer that we use on our Gaming SE Site for keeping track of tags without excerpts to include an incremental row number in the results to make reading the returned values easier. There are a number of questions on here that discuss how to do this, such as this one and this one which appear to have worked for those users, but I can't seem to get it work for my situation.
To be clear, I would like something like this:
RowId | TagName | Count | Easy List Formatting
----------------------------------------------
1 | Tag1 | 6 | 1. [tag:tag1] (6)
2 | Tag2 | 6 | 1. [tag:tag2] (6)
3 | Tag3 | 5 | 1. [tag:tag3] (5)
4 | Tag4 | 5 | 1. [tag:tag4] (5)
What I've come up with so far is this:
SELECT ROW_NUMBER() OVER(PARTITION BY TagInfo.[Count] ORDER BY TagInfo.TagName ASC) AS RowId, *
FROM
(
SELECT
TagName,
[Count],
concat('1. [tag:',concat(TagName,concat('] (', concat([Count],')')))) AS [Easy List Formatting]
FROM Tags
LEFT JOIN Posts pe on pe.Id = Tags.ExcerptPostId
LEFT JOIN TagSynonyms on SourceTagName = Tags.TagName
WHERE coalesce(len(pe.Body),0) = 0 and ApprovalDate is null
) AS TagInfo
ORDER BY TagInfo.[Count] DESC, TagInfo.TagName
This yields something close to what I want, but not quite. The RowId column increments, but once the Count column changes, it resets (presumably because of the PARTITION BY). But, if I remove the PARTITION BY, the RowId column becomes what appear to be random numbers.
Is what I want to do achievable given the way the tables are structured? If so, what should the SQL be?
To access the forked query, you can use this link. The original query (before my changes) can be found here if it helps in anyway.
Removing the PARTITION BY is exactly what is needed. The reason that your numbers look random is that the ORDER BY of the outer query is different from the ORDER BY of your ROW_NUMBER(). All you have to do is make those the same, and the output of the sequence project will have the monotonically increasing value you expect.
Specifically:
SELECT ROW_NUMBER() OVER (ORDER BY TagInfo.[Count] DESC, TagInfo.TagName) AS RowId, *
FROM
(
...
) AS TagInfo
ORDER BY TagInfo.[Count] DESC, TagInfo.TagName
Now you aren't partitioning, and the two ORDER BY clauses match, so you'll get your expected output.
For what it's worth, you technically don't really even care about having an ORDER BY in the ROW_NUMBER(), you just want the same order as the final result set. In that case, you can trick the query engine like so by providing a meaningless ORDER BY clause in the ROW_NUMBER():
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS RowId
Boom, done!
A little way around i used its, i add a column to the original table that has a increment value by 1 for example
ALTER TABLE ur_table ADD id INT IDENTITY(1,1)
GO
after that you do the query with order by column id
select * from ur_table (query) order by id
How can I create a table without knowing in advance how many and what columns it exactly holds?
The idea is that I have a table DATA that has 3 columns : ID, NAME, and VALUE
What I need is a way to get multiple values depending on the value of NAME - I can't do it with simple WHERE or JOIN (because I'll need other values - with other NAME values - later on in my query).
Because of the way this table is constructed I want to PIVOT it in order to transform every distinct value of NAME into a column so it will be easier to get to it in my later search.
What I want now is to somehow save this to a temp table / variable so I can use it later on to join with the result of another query...
So example:
Columns:
CREATE TABLE MainTab
(
id int,
nameMain varchar(max),
notes varchar(max)
);
CREATE TABLE SecondTab
(
id int,
id_mainTab, int,
nameSecond varchar(max),
notes varchar(max)
);
CREATE TABLE DATA
(
id int,
id_second int,
name varchar(max),
value varchar(max)
);
Now some example data from the table DATA:
| id | id_second_int | name | value |
|-------------------------------------------------------|
| 1 | 5550 | number | 111115550 |
| 2 | 6154 | address | 1, First Avenue |
| 3 | 1784 | supervisor | John Smith |
| 4 | 3467 | function | Marketing |
| 5 | 9999 | start_date | 01/01/2000 |
::::
Now imagine that 'name' has A LOT of different values, and in one query I'll need to get a lot of different values depending on the value of 'name'...
That's why I pivot it so that number, address, supervisor, function, start_date, ... become colums.
This I do dynamically because of the amount of possible columns - it would take me a while to write all of them in an 'IN' statement - and I don't want to have to remember to add it manually every time a new 'name' value gets added...
herefore I followed http://sqlhints.com/2014/03/18/dynamic-pivot-in-sql-server/
the thing is know that I want the result of my execute(#query) to get stored in a tempTab / variable. I want to use it later on to join it with mainTab...
It would be nice if I could use #cols (which holds the values of DATA.name) but I can't seem to figure out a way to do this.
ADDITIONALLY:
If I use the not dynamic way (write down all the values manually after 'IN') I still need to create a column called status. Now in this column (so far it's NULL everywhere because that value doesn't exist in my unpivoted table) i want to have 'open' or 'closed', depending on the date (let's say i have start_date and end_date,
CASE end_date
WHEN end_date < GETDATE() THEN pivotTab.status = 'closed'
ELSE pivotTab.status = 'open'
Where can I put this statement? Let's say my main query looks like this:
SELECT * FROM(
(SELECT id_second, name, value, id FROM TABLE_DATA) src
PIVOT (max(value) FOR name IN id, number, address, supervisor, function, start_date, end_date, status) AS pivotTab
JOIN SecondTab ON SecondTab.id = pivotTab.id_second
JOIN MainTab ON MainTab.id = SecondTab.id_mainTab
WHERE pivotTab.status = 'closed';
Well, as far as I can understand - you have some select statement and just need to "dump" its result to some temporary table. In this case you can use select into syntax like:
select .....
into #temp_table
from ....
This will create temporary table according to columns in select statement and populate it with data returned by select datatement.
See MDSN for reference.
I'm selecting data from multiple tables and I also need to get maximum "timestamp" on those tables. I will need that to create custom cache control.
tbl_name tbl_surname
id | name id | surname
--------- ------------
0 | John 0 | Doe
1 | Jane 1 | Tully
... ...
I have following query:
SELECT name, surname FROM tbl_name, tbl_surname WHERE tbl_name.id = tbl_surname.id
and I need to add following info to result set:
SELECT MAX(ora_rowscn) FROM (SELECT ora_rowscn FROM tbl_name
UNION ALL
SELECT ora_rowscn FROM tbl_surname);
I was trying to use UNION but I get error - mixing group and not single group data - or something like that, I know why I cannot use the union.
I don't want to split this into 2 calls, because I need the timestamp of the current snapshot I took from DB for my cache management. And between select and the call for MAX the DB could change.
Here is result I want:
John | Doe | 123456
Jane | Tully | 123456
where 123456 is approximate time of last change (insert, update, delete) of tables tbl_name and tbl_surname.
I have read only access to DB, so I cannot create triggers, stored procedures, extra tables etc...
Thanks for any suggestions.
EDIT: The value *ora_rowscn* is assigned per block of rows. So in one table this value can differ per row. I need the maximal value from both (all) tables involved in query.
Try:
SELECT name,
surname,
max(greatest(tbl_name.ora_rowscn, tbl_surname.ora_rowscn)) over () as max_rowscn
FROM tbl_name, tbl_surname
WHERE tbl_name.id = tbl_surname.id
There's no need to aggregate here - just include both ora_rowscn values in your query and take the max:
SELECT
n.name,
n.ora_rowscn as n_ora_rowscn,
s.surname,
s.ora_rowscn as s_ora_rowscn,
greatest(n.ora_rowscn, s.ora_rowscn) as last_ora_rowscn
FROM tbl_name n
join tbl_surname s on n.id = s.id
BTW, I've replaced your old-style joins with ANSI style - better readable, IMHO.
This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c