Sqlite query comparison multiple times - sql

I have the following schemas (sqlite):
JournalArticle(articleID, title, journal, volume, year, month)
ConferenceArticle(articleID, title, conference, year, location)
Person(name, affiliation)
Author(name, articleID)
I'm trying to get the names of all authors who have number of conferences articles >= journal articles in every year from 2000-2018 inclusive. If an author has 0 articles in each category in a year then the condition still holds. The only years that matter are 2000-2018
The query would be much easier if it was over all years since I could count the journal articles and conferences articles and make a comparison then get the names. However, I'm stuck when trying to check over every year 2000-2018.
I of course don't want to do repetitive queries over all the years. I feel like I may need to group by year but I'm not sure. So far I've been able to get all articles of both types from 2000-2018 as one large table but I'm not sure what to do next.:
select articleID, year
from JournalArticle
where year >= 2000 and year <= 2018
union
select articleID, year
from ConferenceArticle
where year >= 2000 and year <= 2018

Hmmm. Let's start by getting a count for each author and year:
select a.name, year, sum(is_journal), sum(is_conference)
from ((select ja.article_id, ja.year, 1 as is_journal, 0 as is_conference
from journalarticle ja
) union all
(select ca.article_id, ca.year, 0 as is_journal, 1 as is_conference
from conferencearticle ca
)
) jc join
authors a
on a.article_id = jc.article_id
group by a.name, jc.year
Now, you can aggregate again to match the years that match the conditions:
select ay.name
from (select a.name, year, sum(is_journal) as num_journal, sum(is_conference) as num_conference
from ((select ja.article_id, ja.year, 1 as is_journal, 0 as is_conference
from journalarticle ja
) union all
(select ca.article_id, ca.year, 0 as is_journal, 1 as is_conference
from conferencearticle ca
)
) jc join
authors a
on a.article_id = jc.article_id
group by a.name, jc.year
) ay
where (jc.year >= 2000 and jc.year <= 2018) and
num_journal >= num_conference
group by ay.name;

Sounds like you could use a COALESCE in the GROUP BY
SELECT a.name,
COALESCE(j.year, c.year) as "year",
COUNT(j.articleID) AS JournalArticles,
COUNT(c.articleID) AS ConferenceArticles
FROM Author a
LEFT JOIN JournalArticle j ON (j.articleID = a.articleID AND j.year BETWEEN 2000 AND 2018)
LEFT JOIN ConferenceArticle c ON (c.articleID = a.articleID AND c.year BETWEEN 2000 AND 2018)
WHERE (j.year IS NOT NULL OR c.year IS NOT NULL)
GROUP BY a.name, COALESCE(j.year, c.year)
HAVING COUNT(c.articleID) >= COUNT(j.articleID)

Related

Group by join date + add month in where sql

I have the following table:
PERSON
ID
Name
date_created
date_left
What I want is a list of all months and the amount of users joined and the amount of users that left.
I already have the following query: it returns the amount of new users that joined in the month that I pass:
select MONTH(date_created) 'Month', YEAR(date_created) 'Year', count(*) as 'New Users'
from person p
where YEAR(date_created) = 2018 and MONTH( p.date_created) = 5
group by MONTH(date_created), YEAR(date_created)
It returns what I want:
How would I edit this to include a year report and add the column 'Users left' next to the 'new users' one?
My result would be:
MONTH YEAR NEW USERS USERS LEFT
1 2019 10 5
I would "unpivot" the data using cross apply:
select v.[year], v.[month], sum(v.iscreated) as num_created,
sum(v.isleft) as num_left
from person p cross apply
(values (year(p.date_created), month(p.date_created), 1, 0),
(year(p.date_left), month(p.date_left), 0, 1)
) v([year], [month], iscreated, isleft)
group by v.[year], v.[month]
order by v.[year], v.[month];
The straight-forward approach is probably to full outer join all entries and all leaves. SQL Server makes this a bit awkward by not featuring USING, so we must use ON and COALESCE on month and year instead.
select
coalesce(pin.year, pout.year) as year,
coalesce(pin.month, pout.month) as month,
coalesce(pin.cnt, 0) as count_in,
coalesce(pout.cnt, 0) as count_out
from
(
select year(date_created) as year, month(date_created) as month, count(*) as cnt
from person
group by year(date_created), month(date_created)
) pin
full outer join
(
select year(date_left) as year, month(date_left) as month, count(*) as cnt
from person
group by month(date_left), year(date_left)
) pout on pout.year = pin.year and pout.month = pin.month
order by year, month;
Maybe you could do it with a SubSelect? Tried it right now with ORACLE Syntax, I'm not sure if it works in SQL-Server.
SELECT * FROM
(
select MONTH(date_created) 'Month_C', YEAR(date_created) 'Year_C', count(*) as 'New Users'
from person p
where YEAR(date_created) = 2018 and MONTH( p.date_created) = 5
group by MONTH(date_created), YEAR(date_created)
)created_user,
(
select MONTH(date_left) 'Month_L', YEAR(date_left) 'Year_L', count(*) as 'New Users'
from person p
where YEAR(date_left) = 2018 and MONTH( p.date_left) = 5
group by MONTH(date_left), YEAR(date_left)
) left_user
where created_user.Year_C = left_user.Year_L
and created_user.Month_C = left_user.Month_L

SQL Editing Year

The query works but only give years 1985 values. How do I add unlimited amount of years (1985-2014)
use baseball;
SELECT CAST(tf.franchname AS CHAR(20)), s.yearID, s.lgid, AVG(s.salary)
FROM salaries s, teams t, teamsfranchises tf
WHERE s.teamID = t.teamID AND
t.franchID = tf.franchID AND
s.yearID = 1985 AND
(s.lgid='AL' OR s.lgid='NL') GROUP BY tf.franchname, s.yearID, s.lgid order BY
s.yearID;
You could just use BETWEEN.
Your where clause should then look like
(s.yearID BETWEEN 1985 AND 2014) and
Alternatively you could use the < and > operators:
(s.yearID >= 1984 and <= 2014)
If, for any reason you don't have a continous range of years (You only want 5 years). IN could also be an option:
s.yearID IN (1984, 1991, 1996, 2001, 2006)
Your query has a condition filtering on the year and s.yearID = 1985, you may want to change it using the keyword BETWEEN or removing it altogether depending of your need.
select cast(tf.franchname as char(20)), s.yearID, s.lgid, avg(s.salary)
from salaries s, teams t, teamsfranchises tf
where s.teamID = t.teamID and
t.franchID = tf.franchID and
(s.yearID between 1985 and 2014 )and
(s.lgid='AL' OR s.lgid='NL')
group by tf.franchname, s.yearID, s.lgid
order by s.yearID;
This is another view, when there is no data and still you want to get the year with zero count. Just check this link
In this you can create a temporary table which return the list of your years ie 1985 to 2015 , then just join with left outer join and see the magic.
I just get yourquery, you can replace with the accepted answer query too.
Declare #Startyear int = 1985
--1st approach to get continues year
;with yearlist as
(
select 1985 as year
union all
select yl.year + 1 as year
from yearlist yl
where yl.year + 1 <= YEAR(GetDate())
)
select year from yearlist order by year desc;
--2nd approach to get continues year
;WITH n(n) AS
(
SELECT 0
UNION ALL
SELECT n+1 FROM n WHERE n < (year(getdate()) -#Startyear)
)
SELECT year(DATEADD( YY, -n, GetDate()))
FROM n ORDER BY n
--take anyone approach and then join with your query
;with yearlist as
(
select 1985 as year
union all
select yl.year + 1 as year
from yearlist yl
where yl.year + 1 <= YEAR(GetDate())
)
select year from yearlist
left join
(
SELECT CAST(tf.franchname AS CHAR(20)), s.yearID, s.lgid, AVG(s.salary)
FROM salaries s, teams t, teamsfranchises tf
WHERE s.teamID = t.teamID AND
t.franchID = tf.franchID AND
s.yearID = 1985 AND
(s.lgid='AL' OR s.lgid='NL') GROUP BY tf.franchname, s.yearID, s.lgid order BY
s.yearID
) yourtable on yourtable.yearID = yearlist.year
order by year desc;

SQL query of sales by customer over multiple years for top 50 customers in one year

I have a query in SQL server that successfully returns the top 50 customers for a given year by sales. I want to expand it to return their sales for the additional years when they may or may not be in the top 50.
SELECT TOP 50 CU.CustomerName, SUM(ART.SalesAnalysis) AS '2011'
FROM ARTransaction AS ART, Customer AS CU
WHERE ART.CustomerN = CU.CustomerN AND ART.PostingDate BETWEEN '2010-12-31' AND '2012-01-01'
GROUP By CU.CustomerName
ORDER BY SUM(ART.SalesAnalysis) DESC
I tried adding nested SELECT statements but they return strange results and I'm not sure why (might not ever work, but the results have me flabbergasted anyway). When included the values of every row is changed and customers are duplicated.
(SELECT SUM(ART.SalesAnalysis)
WHERE ART.PostingDate BETWEEN '2011-12-31' AND '2013-01-01') AS '2012'
I tried to put a TOP statement in a nested SELECT in HAVING but that tells me
"Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric."
SELECT CU.CustomerName, SUM(ART.SalesAnalysis) AS '2011'
FROM ARTransaction AS ART
JOIN Customer AS CU ON ART.CustomerN = CU.CustomerN
GROUP BY CU.CustomerNAme
HAVING CU.CustomerNAme IN
(SELECT TOP 50 CU.CustomerName
FROM ARTransaction
JOIN Customer ON ARTransaction.CustomerN = Customer.CustomerN
WHERE ARTransaction.SalesAnalysis BETWEEN '2010-12-31' AND '2012-01-01'
GROUP BY Customer.CustomerN
ORDER BY SUM(ART.SalesAnalysis) DESC)
If I understand correctly, you are looking for the top 50 sales for customers based on 2011 data - and want to see all years data for those top 50 from 2011, regardless of those customers being in the top 50 for other years.
Try this, it might need to be tweaked a bit as I don't know the schema, but if I understand the question correctly, this should do the trick.
WITH Top50 AS (
SELECT TOP 50
CU.CustomerN
,SUM(ART.SalesAnalysis) AS SalesTotal
FROM
ARTransaction art
INNER JOIN
Customer cu
ON cu.CustomerN = art.CustomerN
WHERE
ART.PostingDate BETWEEN CAST('2011-01-01' AS DATETIME)
AND CAST('2011-12-31' AS DATETIME)
GROUP BY
CU.CustomerN
ORDER BY
SUM(ART.SalesAnalysis) DESC)
SELECT
c.CustomerName
,SUM(a.SalesAnalysis) AS TotalSales
,YEAR(a.PostingDate) AS PostingDateYear
FROM
ARTransaction a
INNER JOIN
Customer c
ON c.CustomerN = a.CustomerN
INNER JOIN
Top50 t
ON t.CustomerN = a.CustomerN
GROUP BY
c.CustomerName
,YEAR(a.PostingDate)
ORDER BY
PostingDateYear
you could use something like below... you will be looking at the all the data for all the years you want to look at and then just getting the top 50 for the 2011 year
SELECT TOP 50
CU.CustomerName,
SUM(case when year(ART.PostingDate) = 2011 -- or you could use case when ART.PostingDate BETWEEN '2011-01-01' AND '2011-12-31'
then ART.SalesAnalysis
else 0 end) AS 2011,
SUM(case when year(ART.PostingDate) = 2012
then ART.SalesAnalysis
else 0 end) AS 2012
FROM
ARTransaction ART,
inner join Customer CU
on ART.CustomerN = CU.CustomerN
WHERE ART.PostingDate BETWEEN '2011-01-01 AND '2012-12-31'
GROUP By CU.CustomerName
ORDER BY
SUM(case when year(ART.PostingDate) = 2011
then ART.SalesAnalysis
else 0 end) DESC

Select highest profit from each year SQL

How do I obtain the highest value for each year within a table. So let's say we have a table movies and I want to find the highest profiting film for each year.
This is my attempt so far:
SELECT year, MAX(income - cost) AS profit, title
FROM Movies m, Movies m2
GROUP BY year
I am pretty certain it is going to need some sub selects but I can't visualise what I need to do. I was also thinking probably some sort of distinct option to rule out duplicate years.
Title Year Income Cost Length
A 2000 10 2 2
B 2000 9 7 2
So from this the expected result would be
Title Year Profit
A 2000 8
I'm guessing slightly at what you want, but since you've not specified any RDBMS a generic solution would be:
SELECT m.Year, (m.Income - m.Cost) AS Profit, m.Title
FROM Movies m
INNER JOIN
( SELECT m.Year, MAX(m.Income - m.Cost) AS Profit
FROM Movies
GROUP BY m.Year
) MaxProfit
ON MaxProfit.Year = m.Year
AND MaxProfit.Profit = (m.Income - m.Cost)
ORDER BY m.Year
You can also do this using analytic functions if your DBMS permits. e.g. SQL-Server
WITH MovieCTE AS
( SELECT m.Year,
Profit = (m.Income - m.Cost),
m.Title,
RowNumber = ROW_NUMBER() OVER(PARTITION BY m.Year ORDER BY (m.Income - m.Cost) DESC)
FROM Movies
)
SELECT year, Profit, Title
FROM MovieCTE
WHERE RowNumber = 1
It is possible I have misunderstood your exact criteria, but I am sure the same priciples can be applied, you will just need to alter the grouping and the join in the first example, or the partition by in the second.
select m1year,m1profit,title
from
(
(select year as m1year, max(income- cost) as m1profit from movies group by year) m1
join
(select m2year, (income-cost) as m2profit ,title as profit from movies) m2
on
m1profit = m2profit
) m
This will give the highest profit movie for each year, and choose the first title in the event of a tie:
select a.year, a.profit,
(select min(title) from Movies where year = a.year and income - cost = a.profit) as title
from (
select year, max(income - cost) as profit
from Movies -- title, year, cost, income, number
group by year
) as a
order by year desc

TERADATA: Aggregate across multiple tables

Consider the following query where aggregation happens across two tables: Sales and Promo and the aggregate values are again used in a calculation.
SELECT
sales.article_id,
avg((sales.euro_value - ZEROIFNULL(promo.euro_value)) / NULLIFZERO(sales.qty - ZEROIFNULL(promo.qty)))
FROM
( SELECT
sales.article_id,
sum(sales.euro_value),
sum(sales.qty)
from SALES_TABLE sales
where year >= 2011
group by article_id
) sales
LEFT OUTER JOIN
( SELECT
promo.article_id,
sum(promo.euro_value),
sum(promo.qty)
from PROMOTION_TABLE promo
where year >= 2011
group by article_id
) promo
ON sales.article_id = promo.article_id
GROUP BY sales.article_id;
Some notes on the query:
Both the inner queries return huge number of rows due to large number of articles. Running explain on teradata, the inner queries themselves take very less time, but the join takes a long time.
Assume primary key on article_id is present and both the tables are partitioned by year.
Left Outer Join because second table contains optional data.
So, can you suggest a better way of writing this query. Thanks for reading this far :)
Not really sure how the avg function got into the mix, so I'm removing it.
SELECT article_id,
(SUM(sales_value) - SUM(promo_value)) /
(SUM(sales_qty) - SUM(promo_qty))
FROM (
SELECT
article_id,
sum(euro_value) AS sales_value,
sum(qty) AS sales_qty,
0 AS promo_value,
0 AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
UNION ALL
SELECT
article_id,
0 AS sales_value,
0 AS sales_qty,
sum(euro_value) AS promo_value,
sum(qty) AS promo_qty
from SALES_TABLE sales
where year >= 2011
group by article_id
) AS comb
GROUP BY article_id;