How can I have UNION respect column aliases? - sql

Sorry for the bad title, I couldn't think of anything better. Feel free to edit.
I have to work with a db table that uses one column to store different types of information (last name if person, company name if company). A nightmare, I know, but it's what it is.
To distinguish the meaning, there is another column with an integer that specifies the type of what's in the name column.
The schema of this table looks as follows (simplified):
ID int
dtype int
name varchar(50)
So, a sample could look like this:
ID dtype name
---------------------------
1 0 Smith
2 0 Trump
3 1 ABC Ltd.
4 1 XYZ Ltd.
I'm trying to normalize this using the following T-SQL code:
WITH companies AS
(
SELECT ID, name AS company
FROM nametable WHERE dtype=1
),
people AS
(
SELECT ID, name AS person
FROM nametable WHERE dtype=0
),
SELECT * FROM companies UNION ALL SELECT * FROM people;
What I hoped to get is a new table with the schema:
ID
dtype
company
person
Or, in table view:
ID dtype person company
------------------------------------------
1 0 Smith
2 0 Trump
3 1 ABC Ltd.
4 1 XYZ Ltd.
Instead, the field is now just called person instead of name but it's still just one field for 2 types of information.
I understand I could just create a new table and insert each partial result into it but it seems there should be a simpler way. Any advice appreciated.

It seems you need case when which helps you
select ID, dtype,case when dtype=0 then name end AS company,
case when dtype=1 then name end AS person
FROM nametable
The CASE statement goes through conditions and return a value when condition is met, from your sample input and output its clear you want to create type wise new column ,so i used case Statement

You don't need to use UNION for this at all. A better approach would be using a bit of aggregation.
SELECT ID,
MAX(CASE WHEN dtype = 0 THEN [name] END) AS company
MAX(CASE WHEN dtype = 1 THEN [name] END) AS person
FROM nametable
GROUP BY ID;
UNION (ALL) doesn't "care" for aliases though. It combines the datasets it receives into 1. All the datasets must have the same definition and the dataset returned will have the same definition. If the datasets have different aliases for columns, the aliases supplied in the first dataset will be used. UNION doesn't detect that the datasets have different names for the columns and therefore return the different names as different columns; that's not what a UNION does.
Edit: well this will give the OP the data they want, however, there's no need for the aggregation. I was honestly expected ID's to be a shared resource; because that's normally the only time you have such horrid tables. The fact that it isn't just makes this table even more confused...

Related

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

Querying SQL table with different values in same column with same ID

I have an SQL Server 2012 table with ID, First Name and Last name. The ID is unique per person but due to an error in the historical feed, different people were assigned the same id.
------------------------------
ID FirstName LastName
------------------------------
1 ABC M
1 ABC M
1 ABC M
1 ABC N
2 BCD S
3 CDE T
4 DEF T
4 DEG T
In this case, the people with ID’s 1 are different (their last name is clearly different) but they have the same ID. How do I query and get the result? The table in this case has millions of rows. If it was a smaller table, I would probably have queried all ID’s with a count > 1 and filtered them in an excel.
What I am trying to do is, get a list of all such ID's which have been assigned to two different users.
Any ideas or help would be very appreciated.
Edit: I dont think I framed the question very well.
There are two ID's which are present multiple time. 1 and 4. The rows with id 4 are identical. I dont want this in my result. The rows with ID 1, although the first name is same, the last name is different for 1 row. I want only those ID's whose ID is same but one of the first or last names is different.
I tried loading ID's which have multiple occurrences into a temp table and tried to compare it against the parent table albeit unsuccessfully. Any other ideas that I can try and implement?
SELECT
ID
FROM
<<Table>>
GROUP BY
ID
HAVING
COUNT(*) > 1;
SELECT *
FROM myTable
WHERE ID IN (
SELECT ID
FROM myTable
GROUP BY ID
HAVING MAX(LastName) <> MIN(LastName) OR MAX(FirstName) <> MIN(FirstName)
)
ORDER BY ID, LASTNAME

Can IBM DB2 return a 0 (zero) when no records are found?

for example :
I have a table with student ID and student grades
-----------------------
ID | grades
-----------------------
1 | 80
2 | 28
-----------------------
I want to get 0 when I query about ID = 3
can I do that ?
like select grades from student where id = 3 .
I want to get 0 because ID is not in the table
Run a select command with the reserved function called count:
select count(*) from STUDENT.GRADES where ID=3
It should be just like that.
Maybe this will do what you want:
SELECT ID, MAX(Grades)
FROM (SELECT ID, Grade FROM Students WHERE ID = 3
UNION
VALUES (3, 0) -- Not certain of syntax here
)
GROUP BY ID
The basic idea is that students present in the table will have two rows and the MAX will pick their proper grade (assuming that there are no circumstances where the grade is coded as a negative value). Students that are not represented will have just the one row with a grade of 0. The repeated 3 is the ID of the student being sought.
Have fun chasing down the full syntax. I started at Queries in the DB2 9.7 Information Centre, but ran out of patience before I got a good answer — and I don't have DB2 to experiment on. You might need to write SELECT ID, Grades FROM VALUES (3, 0), or there might be some other magical incantation that does the job. You could probably use SELECT 3 AS ID, 0 AS Grades FROM SYSIBM.SYSTABLES WHERE TABID = 1, but that's a clumsy expression.
I've kept with the column name Grades (plural) even though it looks like it contains one grade. It is depressing how often people ask questions about anonymous tables.

What the simplest way to sub-query a variable number of rows into fields of the parent query?

What the simplest way to sub-query a variable number of rows into fields of the parent query?
PeopleTBL
NameID int - unique
Name varchar
Data: 1,joe
2,frank
3,sam
HobbyTBL
HobbyID int - unique
HobbyName varchar
Data: 1,skiing
2,swimming
HobbiesTBL
NameID int
HobbyID int
Data: 1,1
2,1
2,2
The app defines 0-2 Hobbies per NameID.
What the simplest way to query the Hobbies into fields retrieved with "Select * from PeopleTBL"
Result desired based on above data:
NameID Name Hobby1 Hobby2
1 joe skiing
2 frank skiing swimming
3 sam
I'm not sure if I understand correctly, but if you want to fetch all the hobbies for a person in one row, the following query might be useful (MySQL):
SELECT NameID, Name, GROUP_CONCAT(HobbyName) AS Hobbies
FROM PeopleTBL
JOIN HobbiesTBL USING NameID
JOIN HobbyTBL USING HobbyID
Hobbies column will contain all hobbies of a person separated by ,.
See documentation for GROUP_CONCAT for details.
I don't know what engine are you using, so I've provided an example with MySQL (I don't know what other sql engines support this).
Select P.NameId, P.Name
, Min( Case When H2.HobbyId = 1 Then H.HobbyName End ) As Hobby1
, Min( Case When H2.HobbyId = 2 Then H.HobbyName End ) As Hobby2
From HobbyTbl As H
Join HobbiesTbl As H2
On H2.HobbyId = H.HobbyId
Join PeopleTbl As P
On P.NameId = H2.NameId
Group By P.NameId, P.Name
What you are seeking is called a crosstab query. As long as the columns are static, you can use the above solution. However, if you want to dynamic build the columns, you need to build the SQL statement in middle-tier code or use a reporting tool.

SQL select replace integer with string

Goal is to replace a integer value that is returned in a SQL query with the char value that the number represents. For example:
A table attribute labeled ‘Sport’ is defined as a integer value between 1-4. 1 = Basketball, 2 = Hockey, etc. Below is the database table and then the desired output.
Database Table:
Player Team Sport
--------------------------
Bob Blue 1
Roy Red 3
Sarah Pink 4
Desired Outputs:
Player Team Sport
------------------------------
Bob Blue Basketball
Roy Red Soccer
Sarah Pink Kickball
What is best practice to translate these integer values for String values? Use SQL to translate the values prior to passing to program? Use scripting language to change the value within the program? Change database design?
The database should hold the values and you should perform a join to another table which has that data in it.
So you should have a table which has say a list of people
ID Name FavSport
1 Alex 4
2 Gnats 2
And then another table which has a list of the sports
ID Sport
1 Basketball
2 Football
3 Soccer
4 Kickball
Then you would do a join between these tables
select people.name, sports.sport
from people, sports
where people.favsport = sports.ID
which would give you back
Name Sport
Alex Kickball
Gnat Football
You could also use a case statement eg. just using the people table from above you could write something like
select name,
case
when favsport = 1 then 'Basketball'
when favsport = 2 then 'Football'
when favsport = 3 then 'Soccer'
else 'Kickball'
end as "Sport"
from people
But that is certainly not best practice.
MySQL has a CASE statement. The following works in SQL Server:
SELECT
CASE MyColumnName
WHEN 1 THEN 'First'
WHEN 2 THEN 'Second'
WHEN 3 THEN 'Third'
ELSE 'Other'
END
In oracle you can use the DECODE function which would provide a solution where the design of the database is beyond your control.
Directly from the oracle documentation:
Example: This example decodes the value warehouse_id. If warehouse_id is 1, then the function returns 'Southlake'; if warehouse_id is 2, then it returns 'San Francisco'; and so forth. If warehouse_id is not 1, 2, 3, or 4, then the function returns 'Non domestic'.
SELECT product_id,
DECODE (warehouse_id, 1, 'Southlake',
2, 'San Francisco',
3, 'New Jersey',
4, 'Seattle',
'Non domestic') "Location"
FROM inventories
WHERE product_id < 1775
ORDER BY product_id, "Location";
The CASE expression could help. However, it may be even faster to have a small table with an int primary key and a name string such as
1 baseball
2 football
etc, and JOIN it appropriately in the query.
Do you think it would be helpful to store these relationships between integers and strings in the database itself? As long as you have to store these relationships, it makes sense to store it close to your data (in the database) instead of in your code where it can get lost. If you use this solution, this would make the integer a foreign key to values in another table. You store integers in another table, say sports, with sport_id and sport, and join them as part of your query.
Instead of SELECT * FROM my_table you would SELECT * from my_table and use the appropriate join. If not every row in your main column has a corresponding sport, you could use a left join, otherwise selecting from both tables and using = in the where clause is probably sufficient.
definitely have the DB hold the string values. I am not a DB expert by any means, but I would recommend that you create a table that holds the strings and their corresponding integer values. From there, you can define a relationship between the two tables and then do a JOIN in the select to pull the string version of the integer.
tblSport Columns
------------
SportID int (PK, eg. 12)
SportName varchar (eg. "Tennis")
tblFriend Columns
------------
FriendID int (PK)
FriendName (eg. "Joe")
LikesSportID (eg. 12)
In this example, you can get the following result from the query below:
SELECT FriendName, SportName
FROM tblFriend
INNER JOIN tblSport
ON tblFriend.LikesSportID = tblSport.SportID
Man, it's late - I hope I got that right. by the way, you should read up on the different types of Joins - this is the simplest example of one.