Transpose table in SQL - sql

I have a table set up in the following manner.
CustomerNumber June15_Order June15_Billing July15_Order July15_Billing August15_Order August15_Billing
12345 5 55 3 45
5431 6 66 5 67
I would prefer it to be:
CustomerNumber Date Order Billing
12345 01/06/2015 5 55
12345 01/07/2015 3 45
5431 01/06/2015 6 66
5431 01/07/2015 5 67
Any thoughts as to how I would accurately transpose this table?

If you're just trying to get the old data into the new, you'll basically need to use brute force:
INSERT INTO NewTable
(CustomerNumber, [Date], [Order], Billing)
(
SELECT CustomerNumber, '06/15/2015', June15_Order, June15_Billing
FROM OldTable
UNION
SELECT CustomerNumber, '07/15/2015', July15_Order, July15_Billing
FROM OldTable
UNION
SELECT CustomerNumber, '08/15/2015', August15_Order, August15_Billing
FROM OldTable
)

Presuming there are columns for any month and any years, this gets ugly really fast. If the columns are set and hard-coded, use #John Pasquet's solution (+1). If you need the ability to work with any set of columns of the form MMMMDD_Type, here's an outline.
First pass:
Write a SELECT... UNPIVOT... query to transform the table
Map the resulting "label" column to a Date datatype and a "Type" (Order, Billing)
However, mapping result set column names of "July15" to "Jul 1, 2015" (or 01/07/2015) is hard, if not crazy hard. This leads to a second pass:
Build a "lookup" list of columns from sys.tables and sys.colmns
Pick out those that are to be unpivoted
Figure out the dates and types for each of them
Build the SELECT... UNPIVOT... in dynamic SQL, dumping the results to a temp table
Join this temp table to the lookup list by original column name, which (via the join) gets you the prepared date and type values
Seriously, this could get ridiculously complex. The smart money is on rebuild the tables with columns for date and type.

First create the a new table with the desired structure, after that you will need to create a stored procedure for that task, which will iterate over all rows.
On the columns you know old_col to new_col just take the value and save in a variable, for the others you will need to create condition for each month like a "contains june" and save in two variables date and value, after that each time you found a new month with value > 0 perform a insert on the new table with all the variables.

Related

How to aggregate data stored column-wise in a matrix table

I have a table, Ellipses (...), represent multiple columns of a similar type
TABLE: diagnosis_info
COLUMNS: visit_id,
patient_diagnosis_code_1 ...
patient_diagnosis_code_100 -- char(100) with a value of ‘0’ or ‘1’
How do I find the most common diagnosis_code? There are 101 columns including the visit_id. The table is like a matrix table of 0s and 1s. How do I write something that can dynamically account for all the columns and count all the rows where the value is 1?
What I would normally do is not feasable as there are too many columns:
SELECT COUNT(patient_diagnostic_code_1), COUNT(patient_diagnostic_code_2),... FROM diagnostic_info WHERE patient_diagnostic_code_1 = ‘1’ and patient_diagnostic_code_2 = ‘1’ and ….
Then even if I typed all that out how would I select which column had the highest count of values = 1. The table is more column oriented instead of row oriented.
Unfortunately your data design is bad from the start. Instead it could be as simple as:
patient_id, visit_id, diagnosis_code
where a patient with 1 dignostic code would have 1 row, a patient with 100 diagnostic codes 100 rows and vice versa. At any given time you could transpose this into the format you presented (what is called a pivot or cross tab). Also in some databases, for example postgreSQL, you could put all those diagnostic codes into an array field, then it would look like:
patient_id, visit_id, diagnosis_code (data type -bool or int- array)
Now you need the reverse of it which is called unpivot. On some databases like SQL server there is UNPIVOT as an example.
Without knowing what your backend this, you could do that with an ugly SQL like:
select code, pdc
from
(
select 1 as code, count(*) as pdc
from myTable where patient_diagnosis_code_1=1
union
select 2 as code, count(*) as pdc
from myTable where patient_diagnosis_code_2=1
union
...
select 100 as code, count(*) as pdc
from myTable where patient_diagnosis_code_100=1
) tmp
order by pdc desc, code;
PS: This would return all the codes with their frequency ordered from most to least. You could limit to get 1 to get the max (with ties in case there are more than one code to match the max).

SQL - Insert using Column based on SELECT result

I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.

Overwrite data in one table with data from another if two keys match

EDIT: I'm using the PROC SQL functionality in SAS.
I'm trying to overwrite data in a primary table with data in a secondary table if two IDs match. Basically, there is a process modifying certain values associated with various IDs, and after that process is done I want to update the values associated with those IDs in the primary table. For a very simplified example:
Primary table:
PROD_ID PRICE IN_STOCK
1 5.25 17
2 10.24 200 [...additional fields...]
3 6.42 140
...
Secondary table:
PROD_ID PRICE IN_STOCK
2 11.50 175
3 6.42 130
And I'm trying to get the new Primary table to look like this:
PROD_ID PRICE IN_STOCK
1 5.25 17
2 11.50 175 [...additional fields...]
3 6.42 130
...
So it overwrites certain columns in the primary table if the keys match.
In non-working SQL code, what I'm trying to do is something like this:
INSERT INTO PRIMARY_TABLE (PRICE, IN_STOCK)
SELECT PRICE, IN_STOCK
FROM SECONDARY_TABLE
WHERE SECONDARY_TABLE.PROD_ID = PRIMARY_TABLE.PROD_ID
Is this possible to do in one statement like this, or will I have to figure out some workaround using temporary tables (which is something I'm trying to avoid)?
EDIT: None of the current answers seem to be working, although it's probably my fault - I'm using PROC SQL in SAS and didn't specify, so is it possible some of the functionality is missing? For example, the "FROM" keyword doesn't turn blue when using UPDATE, and throws errors when trying to run it, but the UPDATE and SET seem fine...
Do you really want to insert new data? Or update existing rows? If updating, join the tables:
UPDATE PT
SET
PT.PRICE = ST.PRICE,
PT.IN_STOCK = ST.IN_STOCK
FROM
PRIMARY_TABLE PT JOIN SECONDARY_TABLE ST ON PT.PROD_ID = ST.PROD_ID
One answer in SAS PROC SQL is simply to do it as a left join and use COALESCE, which picks the first nonmissing value:
data class;
set sashelp.class;
run;
data class_updates;
input name $ height weight;
datalines;
Alfred 70 150
Alice 59 92
Henry 65 115
Judy 66 95
;;;;
run;
proc sql;
create table class as select C.name, coalesce(U.height,C.height) as height, coalesce(U.weight,C.weight) as weight
from class C
left join class_updates U
on C.name=U.name;
quit;
In this case though the SAS solution outside of SQL is superior in terms of simplicity of coding.
data class;
update class class_updates(in=u);
by name;
run;
This does require both tables to be sorted. There are a host of different ways of doing this (hash table, format lookup, etc.) if you have performance needs.
Try this:
INSERT INTO PRIMARY_TABLE (PRICE, IN_STOCK) VALUES
(SELECT PRICE, IN_STOCK
FROM SECONDARY_TABLE
JOIN PRIMARY_TABLE ON SECONDARY_TABLE.PROD_ID = PRIMARY_TABLE.PROD_ID)
The only reason you would have to use an INSERT statement is if there are IDs present in the secondary table and not present in the primary table. If this is not the case then use a regular UPDATE statement. If it is the case then use the following:
INSERT INTO PRIMARY_TABLE (ID, PRICE, IN_STOCK)
SELECT ID, PRICE, IN_STOCK
FROM SECONDARY_TABLE s
ON DUPLICATE KEY UPDATE PRICE = s.PRICE, IN_STOCK = s.IN_STOCK

Insert and update rows from a file in oracle

I have a file in linux, the file is something like:
(I have millions of rows)
date number name id state
20131110 1089 name1 123 start
20131110 1080 name2 122 start
20131110 1082 name3 121 start
20131114 1089 name1 120 end
20131115 1082 name3 119 end
And i have a table in Oracle with the following fileds:
init_table
start_date
end_date
number
name
id
The problem is that i read that i can insert data with a sqlloader, (I have millions of rows, then create a temporal table to insert and later with a trigger update the other table is not well) the problem is that I have an user with start date X, for example the number 1089 has the start date: 20131110, and the end_date of this user is: 20131114, then i need insert first the start_date in my table, later when i found the end_date, update my table of the number that i am inserting, that in my example is 1089 with the end date that is: 20131114.
How can do it with a ctl, or with other thing.
Who can help me. Thanks
What version of Oracle?
I would use an external table. Define an external table that exactly matches your flat file. Then, you should be able to solve this with two passes, one insert, one update.
Something like this should do it:
insert into init_table select to_date(date,'YYYYMMDD'),null,number,name,id from external_table where state='start';
update init_table set end_date=(select date from external_table where state='end' and init_table.number=external_table.number);
Note that you can't actually have columns named 'date' or 'number', so, the sql above isn't actually going to work as written. You'll have to change those column names.
Hope that helps...
If you use an external table approach then you can join the data in the external table to itself to produce a single record that can then be inserted. Although the join is expensive, overall it ought to be an efficient process as long as the hash join I'd expect to be used stays in memory.
So something like ...
insert into init_table (
start_date,
end_date,
number,
name,
id)
select
s.date,
e.date,
s.number,
s.name,
s.id
from external_table s
join external_table e on s.number = e.number
where
s.state = 'start' and
e.state = 'end'
That assumes that there will always be an end date for every start date, and that the number does not already exist in the table -- if either of those conditions is not true then an outer join would be required in the former case, and a merge required in the latter.
$ cat sqlldrNew.ctl
load data
infile '/home/Sameer/employee.txt'
into table employee
fields terminated by X'9'
( date, -->select number from employee where name=[Name from the file record], name, id, state )
$ sqlldr scott/tiger control=/home/Sameer/sqlldrNew.ctl
I think this should work.

I DISTINCTly hate MySQL (help building a query)

This is staight forward I believe:
I have a table with 30,000 rows. When I SELECT DISTINCT 'location' FROM myTable it returns 21,000 rows, about what I'd expect, but it only returns that one column.
What I want is to move those to a new table, but the whole row for each match.
My best guess is something like SELECT * from (SELECT DISTINCT 'location' FROM myTable) or something like that, but it says I have a vague syntax error.
Is there a good way to grab the rest of each DISTINCT row and move it to a new table all in one go?
SELECT * FROM myTable GROUP BY `location`
or if you want to move to another table
CREATE TABLE foo AS SELECT * FROM myTable GROUP BY `location`
Distinct means for the entire row returned. So you can simply use
SELECT DISTINCT * FROM myTable GROUP BY 'location'
Using Distinct on a single column doesn't make a lot of sense. Let's say I have the following simple set
-id- -location-
1 store
2 store
3 home
if there were some sort of query that returned all columns, but just distinct on location, which row would be returned? 1 or 2? Should it just pick one at random? Because of this, DISTINCT works for all columns in the result set returned.
Well, first you need to decide what you really want returned.
The problem is that, presumably, for some of the location values in your table there are different values in the other columns even when the location value is the same:
Location OtherCol StillOtherCol
Place1 1 Fred
Place1 89 Fred
Place1 1 Joe
In that case, which of the three rows do you want to select? When you talk about a DISTINCT Location, you're condensing those three rows of different data into a single row, there's no meaning to moving the original rows from the original table into a new table since those original rows no longer exist in your DISTINCT result set. (If all the other columns are always the same for a given Location, your problem is easier: Just SELECT DISTINCT * FROM YourTable).
If you don't care which values come from the other columns you can use a (bad, IMHO) MySQL extension to SQL and do:
SELECT * FROM YourTable GROUP BY Location
which will give a result set with one row per location and values for the other columns derived from the original data in an undefined fashion.
Multiple rows with identical values in all columns don't have any sense. OK - the question might be a way to correct exactly that situation.
Considering this table, with id being the PK:
kram=# select * from foba;
id | no | name
----+----+---------------
2 | 1 | a
3 | 1 | b
4 | 2 | c
5 | 2 | a,b,c,d,e,f,g
you may extract a sample for every single no (:=location) by grouping over that column, and selecting the row with minimum PK (for example):
SELECT * FROM foba WHERE id IN (SELECT min (id) FROM foba GROUP BY no);
id | no | name
----+----+------
2 | 1 | a
4 | 2 | c