I'm trying to complete an SQL query homework question and cannot figure out how to add 2 rows together from the same table, with the intention of deleting one after they are combined.
I have a table name 'country', in it, I have 241 rows of data with columns.
name region area population GDP
------------------------------------------------------------------------
Hong Kong Southeast Asia 1040 5542869 136100000000
China Asia 9596960 1203097268 2978800000000
Expected output:
name region area population GDP
-------------------------------------------------------------------
Hong Kong Southeast Asia 1040 5542869 136100000000
China Asia 9598000 1208640137 3114900000000
The goal is to keep the name of the row as "China" and the region as "Asia" but adding the numeric values from the "Hong Kong" row to the "China" row for the columns (area, population, and GDP).
I have tried UNION and MERGE but I'm not familiar with using them and couldn't get it to work.
I feel like it has to be something like the SQL query below:
update country
set area = area.HongKong + area.China
where name = 'China';
but I don't know the proper way to reference a specific row.
Ah yes, homework. I remember those days.
Here is some code to help out -
--create a temp table to hold sample data
select
*
into #country
from
(
values
('Hong Kong', 'Southeast Asia', 1040, 5542869, 136100000000),
('China', 'Asia', 9596960, 1203097268, 2978800000000),
('USA', 'North America', 1, 10, 100) --some other row in table
) d ([name], region, area, population, gdp);
select * from #country;
Here is what the data looks like in this table:
--update statement
with
Totals as
(
select
count(1) rows, --should be 2
sum(area) areaTotal,
sum(population) populationTotal,
sum(gdp) gdpTotal
from #country
where [name] in ('Hong Kong', 'China') --this filter may be insufficient; it would be better to select by a primary key such as a rowID but the table does not have one
)
update c
set c.area = t.areaTotal,
c.population = t.populationTotal,
c.gdp = t.gdpTotal
from #country c
inner join totals t
on c.[name] = 'China';
Here is what the data looks like after the update:
There are a few noteworthy things here:
The code above the update statement, with Totals as (..., is called a common table expression (also abbreviated as CTE). It is being used here to contain the query that calculates the totals needed to perform the update. A CTE is one approach to creating an intermediate data set. Alternative approaches to this include: creating a temp table or using a derived table.
To see the results of the query within the CTE, those lines of code can be highlighted in your editor and executed against the database.
The original post said there are 241 rows of data in the country table. This means it is important to have a reliable filter in the query that calculates the totals so the correct rows for summing are isolated. The preferred way to filter the rows is to use a primary key, such as a rowID. Since this table does not have a primary key column, I made a guess and used the [name] column.
The purpose of count(1) rows in the CTE is to make sure only 2 rows are being summed. If this number comes back as anything other than 2, it means there is a problem with the filter and the where clause needs to be modified. The number of rows will only be visible when you do #2 above.
Now the Hong Kong row needs to be cleaned up:
--delete old row
delete from #country where [name] = 'Hong Kong';
This is what the data looks like now:
Again, it would be preferable to use a primary key column in the where clause instead of the [name] column to be certain that the correct row is being deleted.
Related
Please have a look at the image link to understand my scribbling in the question.
A table has users from different states of Australia who used different methods to approach the application.
This table has column for userids [id](1,2,3,4,5,6..14000), column approaches (a,b,c,d,e,f) and column states (wa,vic..)
Now I need to formulate a table which would have columns with column#1 as approaches and remaining column names with states names like westernaustralia, victoria, SA,queensland etc.
The approach 'a' row would have total no of people who used this approach in different states victoria 5 wa 0
And in the same way other 5 approaches would have number of people used the approach in different states(in columns).
(eg: approach 'b' row- vic 3 wa 1 sa2...etc)
Here's the link for the image please have a look at it .https://i.stack.imgur.com/DldhT.png
you can use a PIVOT statement like below.
create table data (id int, [state] nvarchar(100), approaches nvarchar(10));
insert into data values
(1,'nsw','a'),
(2,'','b'),
(3,'wa','c'),
(4,'qld','d');
select
approaches,
[New South Wales]=[nsw],
[Western Australia]=[wa],
[Queensland]=[qld],
[Victoria]=[vic],
[South Australia]=[sa],
[not available]
from
(
select
id,
case
when [state] ='' then 'not available'
else [state]
end as [state],
approaches
from data
)src
pivot
(
count(id) for [state] in ([nsw],[wa],[qld],[vic],[sa],[not available])
)p
see working demo
I have since modified this process and hopefully simplified. In the past, I had two tables, tablename and result. Tablename had a column (56th of 90) named cnty which was char(3). The second table, result, had a field which essentially was 000 + cnty. This field is titled area. I was trying to insert the cnty values from tablename, add the three zeros at the beginning and place them in the result table under the column, area.
Now I have both columns in result. Area is blank for now. Cnty contains the values that were in tablename (79954 of them).
Sample data
area employment busdesc cnty
410 gas station 003
Desired Result
area employment busdesc cnty
000003 410 gas station 003
Try the following query:
update dbo.result
set area = concat('000',cnty);
Hope it helps!
update res
set res.area = '000' + tbl.cnty
from Result res inner join [originalTable] tbl
on res.id=tbl.id --Don't know how to join both tables, let us know
EDIT: In regard to the last comment, let's create a new table Results2 (because I don't know all the columns in the table) This table will have all columns of tablename plus new Area column.
select *, '000' + cnty as AREA
into Results2
from dbo.tablename;
You have to specify each column or in this case an asterisk * will do
Is this the syntax you are looking for?
insert into dbo.result (name1, area)
select name1, '000' + cnty
from dbo.tablename;
Or, if the table doesn't already exist:
select name1, ('000' + cnty) as area
into dbo.result
from dbo.tablename;
I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.
I want to create a geography dimension using ssis 2008.I have 3 table sources.
Here is the explanation
Table 1 = Country: country code and country name
Table 2 = Post code: post code and city name
Table 3 = Territory : Territory code and Territory name
Here is how data looks
[Table 1= Country]
code name
------------------
US | United states
CA | Canada
[Table 2= post code]
Code city
---------------
1000 | Paris
2000 | Niece
[Table 3= Territory]
Code name
----------------
N | North
S | south
As you can see there is no single common column, I want to group these 3 tables in the same geography dimension.
So, how can I do it ?
Also,The use of this geography dim will be when another dimension for example customer dimension.we want to know the revenue of client according to his geography or the the top salespersons in some city.
and in both customer and salesperson tables you can find the those 3 as foreign keys.
You don't need a "common column" shared by all three tables.
You do need a "column column" between each pair of tables. How else are you going to link them???
Q: Is there any column that links "Country" to "City"? You should have a "country code" column in "city".
Q: Is there any way to link "Territory" with either "post code" or "country"? If "Yes": problem solved. Please list the fields. If "No" ... then you need to change your schema.
Based on you comment to paulsm4 you then want to use those tables that hold the linking information to join to each of the above 3 tables.
On the other hand if you really want to join just those three tables
select * from Country
full outer join [Post code]
on 'a' = 'a'
full outer join Territory
on 'b' = 'b'
create table dim.geography (geoID int,citycode int, countrycode char(2),territorycode char(1))
insert into dim.geography (select city as citycode,country as countrycode, territory as territorycode from Customer union select city, country,territory from salesperson)
Assuming here that Customer and salesperson tables hold the codes and not the values for country,territory, and country.
The code above will build a dimension for the geography you want. Of course if you add any additional unique city,country,territory codes into the customer/salesperson tables you will need to add it to your dimension. This is just an initial load. You may also need to modify the code to account for nulls in any of the three qualifiers.
I'm wondering can we do this query below?
SELECT America, England, DISTINCT (country) FROM tb_country
which will (my intention is to) display :
America
England
(List of distinct country field in tb_country)
So the point is to display (for example) America and England even if the DISTINCT country field returns nothing. Basically I need this query to list a select dropdown, and give some sticky values user can pick, while allowing themselves to add a new country as they wish.
It also goes without saying, that should one row in the tb_country has a value of America or England, they will not show as a duplicate in the query result. So if the tb_country has list of values :
Germany
England
Holland
The query will only output :
America
England
Germany
Holland
You need to use a UNION:
SELECT 'America' AS country
UNION
SELECT 'England' AS country
UNION
SELECT DISTINCT(c.country) AS country
FROM TB_COUNTRY c
UNION will remove duplicates; UNION ALL will not (but is faster for it).
The data type must match for each ordinal position in the SELECT clause. Meaning, if the first column in the first query were INT, the first column for all the unioned statements afterwards need to be INT as well or NULL.
Why you do not add a weight column in tb_country and use a order clause :
Perform once:
update country set weight = 1 where country = 'England';
update country set weight = 1 where country = 'America';
Then use it:
select distinct(country) from tb_country order by desc weight ;
Another way is to use an extra country table with two columns (country, weight) and an outer join.
Personnaly I rather prefer a country table with a UNIQUE constraint for country field and
Use of a foreign key.