Combine multiple rows with different column values into a single one - sql

I'm trying to create a single row starting from multiple ones and combining them based on different column values; here is the result i reached based on the following query:
select distinct ID, case info when 'name' then value end as 'NAME', case info when 'id' then value end as 'serial'
FROM TABLENAME t
WHERE info = 'name' or info = 'id'
Howerver the expected result should be something along the lines of
I tried with group by clauses but that doesn't seem to work.
The RDBMS is Microsoft SQL Server.
Thanks

SELECT X.ID,MAX(X.NAME)NAME,MAX(X.SERIAL)AS SERIAL FROM
(
SELECT 100 AS ID, NULL AS NAME, '24B6-97F3'AS SERIAL UNION ALL
SELECT 100,'A',NULL UNION ALL
SELECT 200,NULL,'8113-B600'UNION ALL
SELECT 200,'B',NULL
)X
GROUP BY X.ID
For me GROUP BY works

A simple PIVOT operator can achieve this for dynamic results:
SELECT *
FROM
(
SELECT id AS id_column, info, value
FROM tablename
) src
PIVOT
(
MAX(value) FOR info IN ([name], [id])
) piv
ORDER BY id ASC;
Result:
| id_column | name | id |
|-----------|------|------------|
| 100 | a | 24b6-97f3 |
| 200 | b | 8113-b600 |
Fiddle here.

I'm a fan of a self join for things like this
SELECT tName.ID, tName.Value AS Name, tSerial.Value AS Serial
FROM TableName AS tName
INNER JOIN TableName AS tSerial ON tSerial.ID = tName.ID AND tSerial.Info = 'Serial'
WHERE tName.Info = 'Name'
This initially selects only the Name rows, then self joins on the same IDs and now filter to the Serial rows. You may want to change the INNER JOIN to a LEFT JOIN if not everything has a Name and Serial and you want to know which Names don't have a Serial

Related

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Rotate rows into columns with column names not coming from the row

I've looked at some answers but none of them seem to be applicable to me.
Basically I have this result set:
RowNo | Id | OrderNo |
1 101 1
2 101 10
I just want to convert this to
| Id | OrderNo_0 | OrderNo_1 |
101 1 10
I know I should probably use PIVOT. But the syntax is just not clear to me.
The order numbers are always two. To make things clearer
And if you want to use PIVOT then the following works with the data provided:
declare #Orders table (RowNo int, Id int, OrderNo int)
insert into #Orders (RowNo, Id, OrderNo)
select 1, 101, 1 union all select 2, 101, 10
select Id, [1] OrderNo_0, [2] OrderNo_1
from (
select RowNo, Id, OrderNo
from #Orders
) SourceTable
pivot (
sum(OrderNo)
for RowNo in ([1],[2])
) as PivotTable
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Note: To build each row in the result set the pivot function is grouping by the columns not begin pivoted. Therefore you need an aggregate function on the column that is being pivoted. You won't notice it in this instance because you have unique rows to start with - but if you had multiple rows with the RowNo and Id you would then find the aggregation comes into play.
As you say there are only ever two order numbers per ID, you could join the results set to itself on the ID column. For the purposes of the example below, I'm assuming your results set is merely selecting from a single Orders table, but it should be easy enough to replace this with your existing query.
SELECT o1.ID, o1.OrderNo AS [OrderNo_0], o2.OrderNo AS [OrderNo_1]
FROM Orders AS o1
INNER JOIN Orders AS o2
ON (o1.ID = o2.ID AND o1.OrderNo <> o2.OrderNo)
From your sample data, simplest you can try to use min and MAX function.
SELECT Id,min(OrderNo) OrderNo_0,MAX(OrderNo) OrderNo_1
FROM T
GROUP BY Id

In postgresql, how can I fill in missing values within a column?

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John
ID # | Name
-------|-----
1 | Bob
1 | (null)
1 | (null)
2 | John
2 | (null)
2 | (null)
`
EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.
CREATE TABLE bobjohn
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bobjohn(id, zname) VALUES
(1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
;
UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
;
SELECT * FROM bobjohn;
NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)
If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):
--
-- CTE's dont work in update queries for Postgres version below 9
--
WITH uniq AS (
SELECT DISTINCT id
-- if there are more than one names for a given Id: pick the lowest
, min(zname) as zname
FROM bobjohn
WHERE zname IS NOT NULL
GROUP BY id
)
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
;
SELECT * FROM bobjohn;
UPDATE tbl
SET name = x.name
FROM (
SELECT DISTINCT ON (id) id, name
FROM tbl
WHERE name IS NOT NULL
ORDER BY id, name
) x
WHERE x.id = tbl.id
AND tbl.name IS NULL;
DISTINCT ON does the job alone. Not need for additional aggregation.
In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.
Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.
If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:
UPDATE some_table AS t1
SET name = (
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
)
WHERE name IS NULL;
If you only want to query the table and have this information filled in on the fly, you can use a similar query:
SELECT
t1.id,
(
SELECT name
FROM some_table AS t2
WHERE t1.id = t2.id
AND name IS NOT NULL
LIMIT 1
) AS name
FROM some_table AS t1;

Count(*) with 0 for boolean field

Let's say I have a boolean field in a database table and I want to get a tally of how many are 1 and how many are 0. Currently I am doing:
SELECT 'yes' AS result, COUNT( * ) AS num
FROM `table`
WHERE field = 1
UNION
SELECT 'no' AS result, COUNT( * ) AS num
FROM `table`
WHERE field = 0;
Is there an easier way to get the result so that even if there are no false values I will still get:
----------
|yes | 3 |
|no | 0 |
----------
One way would be to outer join onto a lookup table. So, create a lookup table that maps field values to names:
create table field_lookup (
field int,
description varchar(3)
)
and populate it
insert into field_lookup values (0, 'no')
insert into field_lookup values (1, 'yes')
now the next bit depends on your SQL vendor, the following has some Sybase (or SQL Server) specific bits (the outer join syntax and isnull to convert nulls to zero):
select description, isnull(num,0)
from (select field, count(*) num from `table` group by field) d, field_lookup fl
where d.field =* fl.field
you are on the right track, but the first answer will not be correct. Here is a solution that will give you Yes and No even if there is no "No" in the table:
SELECT 'Yes', (SELECT COUNT(*) FROM Tablename WHERE Field <> 0)
UNION ALL
SELECT 'No', (SELECT COUNT(*) FROM tablename WHERE Field = 0)
Be aware that I've checked Yes as <> 0 because some front end systems that uses SQL Server as backend server, uses -1 and 1 as yes.
Regards
Arild
This will result in two columns:
SELECT SUM(field) AS yes, COUNT(*) - SUM(field) AS no FROM table
Because there aren't any existing values for false, if you want to see a summary value for it - you need to LEFT JOIN to a table or derived table/inline view that does. Assuming there's no TYPE_CODES table to lookup the values, use:
SELECT x.desc_value AS result,
COALESCE(COUNT(t.field), 0) AS num
FROM (SELECT 1 AS value, 'yes' AS desc_value
UNION ALL
SELECT 2, 'no') x
LEFT JOIN TABLE t ON t.field = x.value
GROUP BY x.desc_value
SELECT COUNT(*) count, field FROM table GROUP BY field;
Not exactly same output format, but it's the same data you get back.
If one of them has none, you won't get that rows back, but that should be easy enough to check for in your code.

Is it possible to concatenate column values into a string using CTE?

Say I have the following table:
id|myId|Name
-------------
1 | 3 |Bob
2 | 3 |Chet
3 | 3 |Dave
4 | 4 |Jim
5 | 4 |Jose
-------------
Is it possible to use a recursive CTE to generate the following output:
3 | Bob, Chet, Date
4 | Jim, Jose
I've played around with it a bit but haven't been able to get it working. Would I do better using a different technique?
I do not recommend this, but I managed to work it out.
Table:
CREATE TABLE [dbo].[names](
[id] [int] NULL,
[myId] [int] NULL,
[name] [char](25) NULL
) ON [PRIMARY]
Data:
INSERT INTO names values (1,3,'Bob')
INSERT INTO names values 2,3,'Chet')
INSERT INTO names values 3,3,'Dave')
INSERT INTO names values 4,4,'Jim')
INSERT INTO names values 5,4,'Jose')
INSERT INTO names values 6,5,'Nick')
Query:
WITH CTE (id, myId, Name, NameCount)
AS (SELECT id,
myId,
Cast(Name AS VARCHAR(225)) Name,
1 NameCount
FROM (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e
WHERE id = 1
UNION ALL
SELECT e1.id,
e1.myId,
Cast(Rtrim(CTE.Name) + ',' + e1.Name AS VARCHAR(225)) AS Name,
CTE.NameCount + 1 NameCount
FROM CTE
INNER JOIN (SELECT Row_number() OVER (PARTITION BY myId ORDER BY myId) AS id,
myId,
Name
FROM names) e1
ON e1.id = CTE.id + 1
AND e1.myId = CTE.myId)
SELECT myID,
Name
FROM (SELECT myID,
Name,
(Row_number() OVER (PARTITION BY myId ORDER BY namecount DESC)) AS id
FROM CTE) AS p
WHERE id = 1
As requested, here is the XML method:
SELECT myId,
STUFF((SELECT ',' + rtrim(convert(char(50),Name))
FROM namestable b
WHERE a.myId = b.myId
FOR XML PATH('')),1,1,'') Names
FROM namestable a
GROUP BY myId
A CTE is just a glorified derived table with some extra features (like recursion). The question is, can you use recursion to do this? Probably, but it's using a screwdriver to pound in a nail. The nice part about doing the XML path (seen in the first answer) is it will combine grouping the MyId column with string concatenation.
How would you concatenate a list of strings using a CTE? I don't think that's its purpose.
A CTE is just a temporarily-created relation (tables and views are both relations) which only exists for the "life" of the current query.
I've played with the CTE names and the field names. I really don't like reusing fields names like id in multiple places; I tend to think those get confusing. And since the only use for names.id is as a ORDER BY in the first ROW_NUMBER() statement, I don't reuse it going forward.
WITH namesNumbered as (
select myId, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY id
) as nameNum
FROM names
)
, namesJoined(myId, Name, nameCount) as (
SELECT myId,
Cast(Name AS VARCHAR(225)),
1
FROM namesNumbered nn1
WHERE nameNum = 1
UNION ALL
SELECT nn2.myId,
Cast(
Rtrim(nc.Name) + ',' + nn2.Name
AS VARCHAR(225)
),
nn.nameNum
FROM namesJoined nj
INNER JOIN namesNumbered nn2 ON nn2.myId = nj.myId
and nn2.nameNum = nj.nameCount + 1
)
SELECT myId, Name
FROM (
SELECT myID, Name,
ROW_NUMBER() OVER (
PARTITION BY myId
ORDER BY nameCount DESC
) AS finalSort
FROM namesJoined
) AS tmp
WHERE finalSort = 1
The first CTE, namesNumbered, returns two fields we care about and a sorting value; we can't just use names.id for this because we need, for each myId value, to have values of 1, 2, .... names.id will have 1, 2 ... for myId = 1 but it will have a higher starting value for subsequent myId values.
The second CTE, namesJoined, has to have the field names specified in the CTE signature because it will be recursive. The base case (part before UNION ALL) gives us records where nameNum = 1. We have to CAST() the Name field because it will grow with subsequent passes; we need to ensure that we CAST() it large enough to handle any of the outputs; we can always TRIM() it later, if needed. We don't have to specify aliases for the fields because the CTE signature provides those. The recursive case (after the UNION ALL) joins the current CTE with the prior one, ensuring that subsequent passes use ever-higher nameNum values. We need to TRIM() the prior iterations of Name, then add the comma and the new Name. The result will be, implicitly, CAST()ed to a larger field.
The final query grabs only the fields we care about (myId, Name) and, within the subquery, pointedly re-sorts the records so that the highest namesJoined.nameCount value will get a 1 as the finalSort value. Then, we tell the WHERE clause to only give us this one record (for each myId value).
Yes, I aliased the subquery as tmp, which is about as generic as you can get. Most SQL engines require that you give a subquery an alias, even if it's the only relation visible at that point.