How to Transpose a resultset from SQL - sql

I am using Microsoft SQL Server 2008.
I have a table that looks something like this:
|======================================================|
| RespondentId | QuestionId | AnswerValue | ColumnName |
|======================================================|
| P123 | 1 | Y | CanBathe |
|------------------------------------------------------|
| P123 | 2 | 3 | TimesADay |
|------------------------------------------------------|
| P123 | 3 | 1.00 | SoapPrice |
|------------------------------------------------------|
| P465 | 1 | Y | CanBathe |
|------------------------------------------------------|
| P465 | 2 | 1 | TimesADay |
|------------------------------------------------------|
| P465 | 3 | 0.99 | SoapPrice |
|------------------------------------------------------|
| P901 | 1 | N | CanBathe |
|------------------------------------------------------|
| P901 | 2 | 0 | TimesADay |
|------------------------------------------------------|
| P901 | 3 | 0.00 | SoapPrice |
|------------------------------------------------------|
I would like to flip the rows to be columns so that this table looks like this:
|=================================================|
| RespondentId | CanBathe | TimesADay | SoapPrice |
|=================================================|
| P123 | Y | 3 | 1.00 |
|-------------------------------------------------|
| P465 | Y | 1 | 0.99 |
|-------------------------------------------------|
| P901 | N | 0 | 0.00 |
|-------------------------------------------------|
(the example data here is arbitrarily made up, so its silly)
The source table is a temp table with approximately 70,000 rows.
What SQL would I need to write to do this?
Update
I don't even know if PIVOT is the right way to go.
I don't know what column to PIVOT on.
The documentation mentions <aggregation function> and <column being aggregated> and I don't want to aggregate anything.
Thanks in advance.

It, is required to use an aggregate function if you use PIVOT. However, since your (RespondentId, QuestionId) combination is unique, your "groups" will have only one row, so you can use MIN() as an aggregate function:
SELECT RespondentId, CanBathe, TimesADay, SoapPrice
FROM (SELECT RespondentId, ColumnName, AnswerValue FROM MyTable) AS src
PIVOT (MIN(AnswerValue) FOR ColumnName IN(CanBathe, TimesADay, SoapPrice)) AS pvt
If a group only contain one row, then MIN(value) = value, or in other words: the aggregate function becomes the identity function.

See if this gets you started. Used to have to use CASE statements to make that happen but it looks like some inkling of PIVOT is in SQL Server now.

PIVOT is a start, but the thing with sql queries is that you really need to know what columns to expect in the result set before writing the query. If you don't know this, the last time I checked you have to either resort to dynamic sql or allow the client app that retrieves the data to do the pivot instead.

Related

In Hive, what is the difference between explode() and lateral view explode()

Assume there is a table employee:
+-----------+------------------+
| col_name | data_type |
+-----------+------------------+
| id | string |
| perf | map<string,int> |
+-----------+------------------+
and the data inside this table:
+-----+------------------------------------+--+
| id | perf |
+-----+------------------------------------+--+
| 1 | {"job":80,"person":70,"team":60} |
| 2 | {"job":60,"team":80} |
| 3 | {"job":90,"person":100,"team":70} |
+-----+------------------------------------+--+
I tried the following two queries but they all return the same result:
1. select explode(perf) from employee;
2. select key,value from employee lateral view explode(perf) as key,value;
The result:
+---------+--------+--+
| key | value |
+---------+--------+--+
| job | 80 |
| team | 60 |
| person | 70 |
| job | 60 |
| team | 80 |
| job | 90 |
| team | 70 |
| person | 100 |
+---------+--------+--+
So, what is the difference between them? I did not find suitable examples. Any help is appreciated.
For your particular case both queries are OK. But you can't use multiple explode() functions without lateral view. So, the query below will fail:
select explode(array(1,2)), explode(array(3, 4))
You'll need to write something like:
select
a_exp.a,
b_exp.b
from (select array(1, 2) as a, array(3, 4) as b) t
lateral view explode(t.a) a_exp as a
lateral view explode(t.b) b_exp as b

Count numbers in single row - SQL

is it possible to return count of values in single row?
For example this is test table and I want to count of daily_typing_pages
SQL> SELECT * FROM employee_tbl;
+------+------+------------+--------------------+
| id | name | work_date | daily_typing_pages |
+------+------+------------+--------------------+
| 1 | John | 2007-01-24 | 250 |
| 2 | Ram | 2007-05-27 | 220 |
| 3 | Jack | 2007-05-06 | 170 |
| 3 | Jack | 2007-04-06 | 100 |
| 4 | Jill | 2007-04-06 | 220 |
| 5 | Zara | 2007-06-06 | 300 |
| 5 | Zara | 2007-02-06 | 350 |
+------+------+------------+--------------------+
Result of this count should be : 1610 how ever if I simply count() AROUND it return:
SQL>SELECT COUNT(daily_typing_pages) FROM employee_tbl ;
+---------------------------+
| COUNT(daily_typing_pages) |
+---------------------------+
| 7 |
+---------------------------+
1 row in set (0.01 sec)
So it return number of rows instead of count single row.
Is there some way how to do things like I want without using external programming language which will count it for me?
Thanks
You want SUM instead of COUNT. COUNT merely counts the number of records, you want them summed.
You didn't mention your DBMS, but see for example, for sql server this
Did you mean you want to summarize alle numbers of daily_typing_pages ?
So you can use sum(daily_typing_pages):
SELECT SUM(daily_typing_pages) FROM employee_tbl

SQL for calculated column that chooses from value in own row

I have a table in which several indentifiers of a person may be stored. In this table I would like to create a single calculated identifier column that stores the best identifier for that record depending on what identifiers are available.
For example (some fictional sample data) ....
Table = "Citizens"
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | ?
2 | Jones | JG892435262 | NULL | NULL | ?
3 | Trask | TSK73948379 | NULL | 9276542119 | ?
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | ?
I know the order in which I would like choose identifiers ...
Drivers-License-No
Social-Security-No
State-Id-No
So I would like the calculated identifier column to be part of the table schema. The desired results would be ...
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | 374-784-8888
2 | Jones | JG892435262 | NULL | 4537409273 | JG892435262
3 | Trask | NULL | NULL | 9276542119 | 9276542119
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | CL231429888
IS this possible? If so what SQL would I use to calculate what goes in the "Calculated" column?
I was thinking of something like ..
SELECT
CASE
WHEN ([DL-No] is NOT NULL) THEN [DL-No]
WHEN ([SS-No] is NOT NULL) THEN [SS-No]
WHEN ([State-Id-No] is NOT NULL) THEN [State-Id-No]
AS "Calculated"
END
FROM Citizens
The easiest solution is to use coalesce():
select c.*,
coalesce([DL-No], [SS-No], [State-ID-No]) as calculated
from citizens c
However, I think your case statement will also work, if you fix the syntax to use when rather than where.

Grouped string aggregation / LISTAGG for SQL Server

I'm sure this has been asked but I can't quite find the right search terms.
Given a schema like this:
| CarMakeID | CarMake
------------------------
| 1 | SuperCars
| 2 | MehCars
| CarMakeID | CarModelID | CarModel
-----------------------------------------
| 1 | 1 | Zoom
| 2 | 1 | Wow
| 3 | 1 | Awesome
| 4 | 2 | Mediocrity
| 5 | 2 | YoureSettling
I want to produce a dataset like this:
| CarMakeID | CarMake | CarModels
---------------------------------------------
| 1 | SuperCars | Zoom, Wow, Awesome
| 2 | MehCars | Mediocrity, YoureSettling
What do I do in place of 'AGG' for strings in SQL Server in the following style query?
SELECT *,
(SELECT AGG(CarModel)
FROM CarModels model
WHERE model.CarMakeID = make.CarMakeID
GROUP BY make.CarMakeID) as CarMakes
FROM CarMakes make
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/
It is an interesting problem in Transact SQL, for which there are a number of solutions and considerable debate. How do you go about producing a summary result in which a distinguishing column from each row in each particular category is listed in a 'aggregate' column? A simple, and intuitive way of displaying data is surprisingly difficult to achieve. Anith Sen gives a summary of different ways, and offers words of caution over the one you choose...
If it is SQL Server 2017 or SQL Server VNext, Azure SQL database you can use String_agg as below:
SELECT make.CarMakeId, make.CarMake,
CarModels = string_agg(model.CarModel, ', ')
FROM CarModels model
INNER JOIN CarMakes make
ON model.CarMakeId = make.CarMakeId
GROUP BY make.CarMakeId, make.CarMake
Output:
+-----------+-----------+---------------------------+
| CarMakeId | CarMake | CarModels |
+-----------+-----------+---------------------------+
| 1 | SuperCars | Zoom, Wow, Awesome |
| 2 | MehCars | Mediocrity, YoureSettling |
+-----------+-----------+---------------------------+

Merge computed data from two tables back into one of them

I have the following situation (as a reduced example). Two tables, Measures1 and Measures2, each of which store an ID, a Weight in grams, and optionally a Volume in fluid onces. (In reality, Measures1 has a good deal of other data that is irrelevant here)
Contents of Measures1:
+----+----------+--------+
| ID | Weight | Volume |
+----+----------+--------+
| 1 | 100.0000 | NULL |
| 2 | 200.0000 | NULL |
| 3 | 150.0000 | NULL |
| 4 | 325.0000 | NULL |
+----+----------+--------+
Contents of Measures2:
+----+----------+----------+
| ID | Weight | Volume |
+----+----------+----------+
| 1 | 75.0000 | 10.0000 |
| 2 | 400.0000 | 64.0000 |
| 3 | 100.0000 | 22.0000 |
| 4 | 500.0000 | 100.0000 |
+----+----------+----------+
These tables describe equivalent weights and volumes of a substance. E.g. 10 fluid ounces of substance 1 weighs 75 grams. The IDs are related: ID 1 in Measures1 is the same substance as ID 1 in Measures2.
What I want to do is fill in the NULL volumes in Measures1 using the information in Measures2, but keeping the weights from Measures1 (then, ultimately, I can drop the Measures2 table, as it will be redundant). For the sake of simplicity, assume that all volumes in Measures1 are NULL and all volumes in Measures2 are not.
I can compute the volumes I want to fill in with the following query:
SELECT Measures1.ID, Measures1.Weight,
(Measures2.Volume * (Measures1.Weight / Measures2.Weight))
AS DesiredVolume
FROM Measures1 JOIN Measures2 ON Measures1.ID = Measures2.ID;
Producing:
+----+----------+-----------------+
| ID | Weight | DesiredVolume |
+----+----------+-----------------+
| 4 | 325.0000 | 65.000000000000 |
| 3 | 150.0000 | 33.000000000000 |
| 2 | 200.0000 | 32.000000000000 |
| 1 | 100.0000 | 13.333333333333 |
+----+----------+-----------------+
But I am at a loss for how to actually insert these computed values into the Measures1 table.
Preferably, I would like to be able to do it with a single query, rather than writing a script or stored procedure that iterates through every ID in Measures1. But even then I am worried that this might not be possible because the MySQL documentation says that you can't use a table in an UPDATE query and a SELECT subquery at the same time, and I think any solution would need to do that.
I know that one workaround might be to create a new table with the results of the above query (also selecting all of the other non-Volume fields in Measures1) and then drop both tables and replace Measures1 with the newly-created table, but I was wondering if there was any better way to do it that I am missing.
UPDATE Measures1
SET Volume = (Measures2.Volume * (Measures1.Weight / Measures2.Weight))
FROM Measures1 JOIN Measures2
ON Measures1.ID = Measures2.ID;