simple sql: how do I group into separate columns? - sql

Say I keep stocks prices in a 3 column table like this:
create table stocks(
ticker text,
day int,
price int
);
insert into stocks values ('aapl', 1, 100);
insert into stocks values ('aapl', 2, 104);
insert into stocks values ('aapl', 3, 98);
insert into stocks values ('aapl', 4, 99);
insert into stocks values ('goog', 1, 401);
insert into stocks values ('goog', 2, 390);
insert into stocks values ('goog', 3, 234);
And I want results that look like:
day aapl goog
1 100 401
2 104 390
3 98 234
4 99 null
Do I really need to select twice, once for each ticker, and then outer join the results?

Like this:
Select day,
MAX(case WHEN ticker = 'aapl' then price end) as 'aapl',
MAX(case WHEN ticker = 'goog' then price end) as 'goog'
From stocks
group by day
DEMO

Regardless of the database you are using, the concept of what you are trying to achieve is called "Pivot Table".
Here's an example for mysql:
http://en.wikibooks.org/wiki/MySQL/Pivot_table
Some databases have builtin features for that, see the links below.
SQLServer:
http://msdn.microsoft.com/de-de/library/ms177410.aspx
Oracle:
http://www.dba-oracle.com/t_pivot_examples.htm
You can always create a pivot by hand. Just select all the aggregations in a result set and then select from that result set.
Note, in your case, you can put all the names into one column using concat (i think that's group_concat in mysql), since you cannot know how many names are related to a ticker.

Yes you do, unless your DB has SQL extensions for pivoting. Here's how you do it in Microsoft SQL Server.

Related

Remove clients who don't have 2 rows by their name in SQL

What I'm trying to do is to filter by the clients that registered twice in the DB. This as I need to know who of them came at least twice, that is why I´m working with a table that registers every time they registered in the system as it follows:
order #
client
date
One
Andrew
XX
Two
Andrew
XX+1
Three
Andrew
XX+2
One
David
YY
One
Marc
ZZ
Two
Marc
ZZ+1
In this case I want to delete David´s record, as I only want people who has order numbers distinct than "one".
I tried this SQL:
select *
from table
where order_number > 1
however what this does is remove all the rows of the first orders, including the ones that came back.
Does somebody know an easy way for me to compare row names and filter by that or just how could I delete those rows in which there are clients with only one entry?
you need something like this :
select * from yourtable
where not exists (select 1 from yourtable where order_number >1)
or:
select client
from tablename
group by client
having count(*) > 1
CREATE TABLE records (
ID INTEGER PRIMARY KEY,
order_number TEXT NOT NULL,
client TEXT NOT NULL,
date DateTime NOT NULL
);
INSERT INTO records VALUES (1,'ONE', 'Adrew', '01.01.1999');
INSERT INTO records VALUES (2, 'TWO','Adrew', '02.02.1999');
INSERT INTO records VALUES (3, 'THREE','Adrew', '03.03.1999');
INSERT INTO records VALUES (4, 'ONE', 'David', '01.01.1999');
INSERT INTO records VALUES (5, 'ONE','Marc', '01.01.1999');
INSERT INTO records VALUES (6, 'TWO','Marc', '01.03.1999');
DELETE FROM records WHERE ID in
(
SELECT COUNT(client) as numberofclient FROM records
Group By client Having Count (client) > 1
);

SQL query to print the name of the employee getting max % hike in the annual salary from 2018 to 2019

I am very much new to SQL and have been practicing straight forward queries since a few days. I cam across this question which I would request help for. So there is a table with employee name, salary and salary date as columns. Date is simply Year from 2015 to 2019. I am trying to write a query where I can get employee name with the maximum % hike in salary from 2018 to 2019. I wrote the below query but stuck for hours at the same.
CREATE TABLE data (
salary INTEGER NOT NULL,
emp_name TEXT NOT NULL,
Sal_Date YEAR NOT NULL
);
INSERT INTO data VALUES (10000, 'Ryan', 2015);
INSERT INTO data VALUES (12000, 'Bryan', 2016);
INSERT INTO data VALUES (11000, 'Manthan', 2016);
INSERT INTO data VALUES (15000, 'Susan', 2017);
INSERT INTO data VALUES (16000, 'Alien', 2017);
INSERT INTO data VALUES (10000, 'Ryan', 2018);
INSERT INTO data VALUES (12000, 'Bryan', 2018);
INSERT INTO data VALUES (11000, 'Manthan', 2018);
INSERT INTO data VALUES (15000, 'Susan', 2018);
INSERT INTO data VALUES (16000, 'Alien', 2018);
INSERT INTO data VALUES (11000, 'Ryan', 2019);
INSERT INTO data VALUES (13000, 'Bryan', 2019);
INSERT INTO data VALUES (15000, 'Manthan', 2019);
INSERT INTO data VALUES (18000, 'Susan', 2019);
INSERT INTO data VALUES (32000, 'Alien', 2019);
SELECT salary from data
group by ;
I am getting no logic to solve this. Can anybody please help with the query and logic explanation. I'll be grateful. Thanks
SELECT * , (salary - LAG(salary,1,salary) over (partition by emp_name order by sal_date)) /salary * 100 as promotionPercentage
from data
order by emp_name , sal_date
db<>fiddle here
You can use a Self-join to join a table to itself and return the salary for the previous year, then order by the % change in salary and return the first one
SELECT TOP 1 t1.emp_name
FROM data t1
JOIN data t2 ON t1.emp_name = t2.emp_name AND t1.Sal_date = t2.Sal_Date + 1
WHERE t1.Sal_Date = 2019
ORDER BY (t1.Sal_Date - t2.Sal_Date) / t2.Sal_Date * 100
Alternatively, you can use the LAG analytic function but I am not sure it is present in all DBMS?
I am using MSSQL for this one
You can self-join the table with the condition that:
same emp_name
the first table is for 2018, the other one 2019
Then, you can compare the salary in 2018 and 2019 in the same row.
Here is a query example. You may need to modify a bit depending on your SQL engine.
SELECT
d18.emp_name,
cast(d19.salary AS FLOAT) / d18.salary - 1 AS hike_18_to_19
FROM
data d18
INNER JOIN
data d19
ON
d18.emp_name = d19.emp_name
AND d18.Sal_Date = 2018
AND d19.Sal_Date = 2019
ORDER BY
hike_18_to_19 DESC
You get something like this:
emp_name hike_18_to_19
0 Alien 1.000000
1 Manthan 0.363636
2 Susan 0.200000
3 Ryan 0.100000
4 Bryan 0.083333

avoiding group by for column used in datediff?

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)
If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1

Product price comparison in sql

I have a table looks like given below query, I add products price in this table daily, with different sellers name :
create table Product_Price
(
id int,
dt date,
SellerName varchar(20),
Product varchar(10),
Price money
)
insert into Product_Price values (1, '2012-01-16','Sears','AA', 32)
insert into Product_Price values (2, '2012-01-16','Amazon', 'AA', 40)
insert into Product_Price values (3, '2012-01-16','eBay','AA', 27)
insert into Product_Price values (4, '2012-01-17','Sears','BC', 33.2)
insert into Product_Price values (5, '2012-01-17','Amazon', 'BC',30)
insert into Product_Price values (6, '2012-01-17','eBay', 'BC',51.4)
insert into Product_Price values (7, '2012-01-18','Sears','DE', 13.5)
insert into Product_Price values (8, '2012-01-18','Amazon','DE', 11.1)
insert into Product_Price values (9, '2012-01-18', 'eBay','DE', 9.4)
I want result like this for n number of sellers(As more sellers added in table)
DT PRODUCT Sears[My Site] Amazon Ebay Lowest Price
1/16/2012 AA 32 40 27 Ebay
1/17/2012 BC 33.2 30 51.4 Amazon
1/18/2012 DE 7.5 11.1 9.4 Sears
I think this is what you're looking for.
SQLFiddle
It's kind of ugly, but here's a little breakdown.
This block allows you to get a dynamic list of your values. (Can't remember who I stole this from, but it's awesome. Without this, pivot really isn't any better than a big giant case statement approach to this.)
DECLARE #cols AS VARCHAR(MAX)
DECLARE #query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT distinct ',' +
QUOTENAME(SellerName)
FROM Product_Price
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
, 1, 1, '')
Your #cols variable comes out like so:
[Amazon],[eBay],[Sears]
Then you need to build a string of your entire query:
select #query =
'select piv1.*, tt.sellername from (
select *
from
(select dt, product, SellerName, sum(price) as price from product_price group by dt, product, SellerName) t1
pivot (sum(price) for SellerName in (' + #cols + '))as bob
) piv1
inner join
(select t2.dt,t2.sellername,t1.min_price from
(select dt, min(price) as min_price from product_price group by dt) t1
inner join (select dt,sellername, sum(price) as price from product_price group by dt,sellername) t2 on t1.min_price = t2.price) tt
on piv1.dt = tt.dt
'
The piv1 derived table gets you the pivoted values. The cleverly named tt derived table gets you the seller who has the minimum sales for each day.
(Told you it was kind of ugly.)
And finally, you run your query:
execute(#query)
And you get:
DT PRODUCT AMAZON EBAY SEARS SELLERNAME
2012-01-16 AA 40 27 32 eBay
2012-01-17 BC 30 51.4 33.2 Amazon
2012-01-18 DE 11.1 9.4 13.5 eBay
(sorry, can't make that bit line up).
I would think that if you have a reporting tool that can do crosstabs, this would be a heck of a lot easier to do there.
The problem is this requirement:
I want result like this for n number of sellers
If you have a fixed, known number of columns for your results, there are several techniques to PIVOT your data. But if the number of columns is not known, you're in trouble. The SQL language really wants you to be able to describe the exact nature of the result set for the select list in terms of the number and types of columns up front.
It sounds like you can't do that. This leaves you with two options:
Query the data to know how many stores you have and their names, and then use that information to build a dynamic sql statement.
(Preferred option) Perform the pivot in client code.
This is something that would probably work well with a PIVOT. Microsoft's docs are actually pretty useful on PIVOT and UNPIVOT.
http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
Basically it allows you to pick a column, in your case SellerName, and pivot that out so that the elements of the column themselves become columns in the new result. The values that go in the new "Ebay", "Amazon", etc. columns would be an aggregate that you choose - in this case the MAX or MIN or AVG of the price.
For the final "Lowest Price" column you'd likely be best served by doing a subquery in your main query which finds the lowest value per product/date and then joining that back in to get the SellerName. Something like:
SELECT
Product_Price.Date
,Product_Price.Product
,Product_Price.MinimumSellerName
FROM
(SELECT
MIN(Price) AS min_price
,Product
,Date
FROM Product_Price
GROUP BY
Product
,Date) min_price
INNER JOIN Product_Price
ON min_price.Product = Product_Price.Product
AND min_price.Date = Product_Price.Date
Then just put the pivot around that and include the MinimumSellerName columnm, just like you include date and product.

SQL query to separate a column into separate columns

I would like to have separate columns for H and T's prices, with 'period' as the common index. Any suggestions as to how I should go about this?
This is what my SQL query produces at the moment:
You can use GROUP BY and a conditional, like this:
SELECT
period
, SUM(CASE NAME WHEN 'H' THEN price ELSE 0 END) as HPrice
, SUM(CASE NAME WHEN 'T' THEN price ELSE 0 END) as TPrice
FROM MyTable
GROUP BY period
You can do the following:
SELECT period,
max(CASE WHEN name = 'H' THEN price END) as h_price,
max(CASE WHEN name = 'T' THEN price END) as t_price
FROM myTable
GROUP by period
If you mean to recreate the table?
1) Create a new table with columns: period, price_h & price_t.
2) Copy all (distinct) from period into new table's period.
3) Copy all price where name = H to new table's price_h joining the period column
4) repeat 3 for price_t....
good luck!
A little late to the game on this but you could also pivot the data.
Lets create a sample table.
CREATE TABLE myData(period int, price decimal(12,4), name varchar(10))
GO
-- Inserting Data into Table
INSERT INTO myData
(period, price, name)
VALUES
(1, 53.0450, 'H'),
(1, 55.7445, 'T'),
(2, 61.2827, 'H'),
(2, 66.0544, 'T'),
(3, 61.3405, 'H'),
(3, 66.0327, 'T');
Now the select with the pivot performed.
SELECT period, H, T
FROM (
SELECT period, price, name
FROM myData) d
PIVOT (SUM(price) FOR name IN (H, T)) AS pvt
ORDER BY period
I've used this technique when I needed to build a dynamic sql script that took in the columns in which would be displayed on the header of the table. No need for case statements.
Im not sure about the performance of the case and pivot. Maybe someone with a little more experience could add some comments on which would give better performance.