Get row if a number is inside a string range from a column - sql

My table(with simplified columns) has this structure:
brand
period
fuel
Audi
2008-2016
G
BWM
2018-
D
The user will give me the matriculation year of the car and I want to look inside all the periods and find if the given year is inside the range.
For example 2010 would return me the Audi and 2020 should return the BWM.
My current query looks like the following:
SELECT *
FROM cars
WHERE brand='Audi' AND fuel='G' AND
<userIntroducedYear> BETWEEN <year1> AND <year2>;
My guess is that I should be able to get this with some subquery instead of the BETWEEN but I'm a bit lost doing it.
Thanks in advance.

You should fix your data model so you are:
Storing number values as numbers (years are numbers, not strings).
Storing only one value in a scalar column.
Postgres offers an range data type as well, which does exactly what you want.
For this, though, I'll just parse period into the period start and end years using split_part() and a lateral join to do the work in the FROM clause:
select c.*
from cars c cross join lateral
(values (split_part(c.period, '-', 1)::int, nullif(split_part(c.period, '-', 2), '')::int)
) v(period_start, period_end)
where c.brand = 'Audi' and c.fuel = 'G' and
v.period_start <= :userIntroducedYear and
(v.period_end >= :userIntroducedYear or v.period_end is null);

Related

SQL: Calculate the rating based on different columns and use it as an argument

I'm trying to calculate the rating based on a table that has 3 columns with different ratings ranging from 1 to 5.
I wanted to calculate the average of these 3 values and then be able to use this as an argument in queries, for example:
Where Rating >3.5
At this moment I have this that gives me the average for all suppliers
SELECT c.Name
,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2))
FROM(
VALUES(b.Qty_Price),
(b.Quality),
(b.DeliveryTime)) A (rat)) AS Rating
FROM Order a
JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation
JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier
What I would like now is, for example, to filter the providers that have more than 3 ratings, but I don't know how I can do that. If anyone can help i would be grateful
If the query works like you want it, you get the average for all entries, that is.
The WHERE rating > 3.5 cannot be added, as rating does not exist in the context of the SELECT-clause, nor the tables we JOIN.
To overcome this issue, we can keep the query that you have made, call it something different using WITH and SELECT from that sub-query WHERE rating > 3.5
It should look something like this:
WITH Averages(name, rating) AS
(SELECT c.name
,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2))
FROM(
VALUES(b.qty_Price),
(b.quality),
(b.deliveryTime)) AS (rat)) AS rating
FROM Order a
JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation
JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier)
SELECT name, rating FROM Averages WHERE rating > 3.5;
Now, we simply call the query you provided as Averages for example, and we SELECT from that table WHERE rating > 3.5.
Also note that you can have multiple WITHs to make things easier for you, but remember that a comma (,) is needed to seperate them. In our case, we only have 1 use of WITH ... AS, so no need for a comma or semi-colon after ...= a.NIF_Supplier)
Looks like you typed only "A" before "(rat)", it should be "AS". Also, remember that attributes should be lowercase, it makes it easier for all of us to distinguish tables from attributes.
Cheers!

Postgresql ARRAY_AGG on array only returns first value

In Postgres 10 I'm having an issue converting an integer to a weekday name and grouping all record values via ARRAY_AGG to form a string.
The following subquery only returns the first value in the arrays indexed by timetable_periods.day (which is an integer)
SELECT ARRAY_TO_STRING(ARRAY_AGG((ARRAY['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[timetable_periods.day]), '-')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
whereas this shows all days concatenated in a string, as expected:
SELECT ARRAY_TO_STRING(ARRAY_AGG(timetable_periods.day), ', ')
FROM timetable_periods
WHERE courses.id = timetable_periods.course_id
GROUP BY timetable_periods.course_id
E.G. A Course has 2 timetable_periods, with day values 0 and 2 (i.e. Monday and Wednesday)
The first query only returns "Tue" instead of "Mon, Wed" (so both an indexing issue and only returning the first day).
The second query returns "0, 2" as expected
Am I doing something wrong in the use of ARRAY( with the weeknames?
Thanks
Update: The queries above are subqueries, with the courses table in the main query's FROM
You should post correct SQL statements. I suspect a JOIN of courses and timetable_periods, but courses is missing in the FROM clause. Furthermore, both queries contain AND followed by GROUP BY - this will not work.
From your writings I guess you want something like:
select
c.id,
string_agg((array['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])[tp.day + 1], ', ') as day_names
from
courses c
inner join timetable_periods tp on c.id = tp.course_id
group by
c.id
Your attempts to access the day names array were quite correct. But indexing arrays is 1-based. Concatenating text values can be done with string_agg.

SQL - sum column for every date

This seemed like a very easy thing to do but I got stuck. I have a query like this:
select op.date, count(p.numberofoutstanding)
from people p
left join outstandingpunches op
on p.fullname = op.fullname
group by op.date
That outputs a table like this:
How can I sum over the dates so the sum for each row is equal to the sums up to that date? For example, the first column would be 27, the second would be 27 + 4, the third 27 + 4 + 11, etc.
I encountered this and this question, and I saw people are using OVER in their queries for this, but I'm confused by what do I have to partition. I tried partitioning by date but it's giving me incorrect results.
You can use a cumulative sum. This looks like:
select op.date, count(*),
sum(count(*)) over (order by op.date) as running_count
from people p join
outstandingpunches op
on p.fullname = op.fullname
group by op.date;
Note: I changed the join from a left join to an inner join. You are aggregating by a column in the second table. Your results have no examples of a NULL date column and that doesn't seem useful. Hence, it seems that rows are assumed to match.
I believe you need to use sum and not count.
select o.date_c,
sum(sum(p.numberofoutstanding)) over (order by o.date_c)
from people p
left join outstandingpunches o on p.fullname = o.fullname
group by o.date_c;
Here is a small demo:
DEMO
Have in mind that I have renamed your column date to date_c. I believe you should not use data types as column names.

Grouping a percentage calculation in postgres/redshift

I keep running in to the same problem over and over again, hoping someone can help...
I have a large table with a category column that has 28 entries for donkey breed, then I'm counting two specific values grouped by each of those categories in subqueries like this:
WITH totaldonkeys AS (
SELECT donkeybreed,
COUNT(*) AS total
FROM donkeytable1
GROUP BY donkeybreed
)
,
sickdonkeys AS (
SELECT donkeybreed,
COUNT(*) AS totalsick
FROM donkeytable1
JOIN donkeyhealth on donkeytable1.donkeyid = donkeyhealth.donkeyid
WHERE donkeyhealth.sick IS TRUE
GROUP BY donkeybreed
)
,
It's my goal to end up with a table that has primarily the percentage of sick donkeys for each breed but I always end up struggling like hell with the problem of not being able to group by without using an aggregate function which I cannot do here:
SELECT (CAST(sickdonkeys.totalsick AS float) / totaldonkeys.total) * 100 AS percentsick,
totaldonkeys.donkeybreed
FROM totaldonkeys, sickdonkeys
GROUP BY totaldonkeys.donkeybreed
When I run this I end up with 28 results for each breed of donkey, one correct I believe but obviously hundreds of useless datapoints.
I know I'm probably being really dumb here but I keep hitting in to this same problem again and again with new donkeydata, I should obviously be structuring the whole thing a new way because you just can't do this final query without an aggregate function, I think I must be missing something significant.
You can easily count the proportion that are sick in the donkeyhealth table
SELECT d.donkeybreed,
AVG( (dh.sick)::int ) AS proportion_sick
FROM donkeytable1 d JOIN
donkeyhealth dh
ON d.donkeyid = dh.donkeyid
GROUP BY d.donkeybreed

SQL - count amount of occurences for items in different price diapasons

I have a question about SQL, and I honestly tried to search methods before asking. I will give an abstract (but precise) description below, and will greatly appreciate your example of solution (SQL query).
What I have:
Table A with category ids of the items and prices (in USD) for each item. category id has int type of value, price is string and looks like "USD 200000000" (real value is multiplied by 10^7). Tables also has a kind column with int type of value.
Table B with relation of category id and name.
What I need:
Get a table with price diapasons (like 0-100 | 100-200 | ...) as column names and count amount of items for each category id (as lines names) in all of the price diapasons. All results must be filtered by kind parameter (from table A) with value 3.
Questions, that I encountered (and which caused to ask for an example of SQL query):
Cut "USD from price string value, divide it by 10^7 and convert to float.
Gather diapasons of price values (0-100 | 100-200 | ...), with given step in the given interval (max price is considered as unknown at the start). Example: step 100 on 0-500 interval, and step 200 for values >500.
Put diapasons of price values into column names of the result table.
For each diapason, count amount of items in each category (category_id). Left limit of diapason shall not be considered (e.g. on 1000-1200 diapason, items with price 1000 shall not be considered).
Using B table, display name instead of category id.
Response is appreciated, ignorance will be understood.
If you only need category ids, then you do not need B. What you are looking for is conditional aggregation, something like:
select category_id,
sum(case when cast(substring(price, 4, 100) as int)/10000000 < 100 then 1 else 0 end) as price_000_100
sum(case when cast(substring(price, 4, 100) as int)/10000000 >= 100 and cast(substring(price, 4, 100) as int)/10000000 < 200
then 1 else 0
end) as price_100_200,
. . .
from a
group by category_id
There is no standard way to do what you describe.
That is because to do (3) you need a pivot aka crosstab, and this is not in ANSI SQL. Each DBMS has it's own implementation. Plus dynamic columns in a pivot table are an additional complication.
For example, Postgres calls it a "crosstab" and requires the tablefunc module to be installed. See this SO question and the documentation. Compare to SQL Server, which uses the PIVOT command.
You can get close using reasonably standard SQL.
Here is an example based on SQLite. A little bit of conversion would provide a solution for other systems, e.g. SUBSTR would be substring(string [from int] [for int]) in postgre.
Assuming a data table of format:
and a category name table of:
then the following code will produce:
WITH dataCTE AS
(SELECT product_id AS 'ID', CAST(SUBSTR(price, 5) AS INT)/1000000 AS 'USD',
CASE WHEN (CAST(SUBSTR(price, 5) AS INT)/1000000) <= 500 THEN
100 ELSE 200
END AS 'Interval'
FROM data
WHERE kind = 3),
groupCTE AS
(SELECT dataCTE.ID AS 'ID', dataCTE.USD AS 'USD', dataCTE.Interval AS 'Interval',
CASE WHEN dataCTE.Interval = 100 THEN
CAST(dataCTE.USD AS INT)/100
ELSE
(CAST(dataCTE.USD-500 AS INT)/200)+5
END AS 'GroupID'
FROM dataCTE),
cleanCTE AS
(SELECT *, CASE WHEN groupCTE.Interval = 100 THEN
CAST(groupCTE.GroupID *100 AS VARCHAR)
|| '-' ||
CAST((groupCTE.GroupID *100)+99 AS VARCHAR)
ELSE
CAST(((groupCTE.GroupID-5)*200)+500 AS VARCHAR)
|| '-' ||
CAST(((groupCTE.GroupID-5)*200)+500+199 AS VARCHAR)
END AS 'diapason'
FROM groupCTE
INNER JOIN cat_name AS cn ON groupCTE.ID = cn.cat_id)
SELECT *
FROM cleanCTE;
If you modify the last SELECT to:
SELECT name, diapason, COUNT(diapason)
FROM cleanCTE
GROUP BY name, diapason;
then you get a grouped output:
This is as close as you will get without specifying the exact system; even then you will have a problem with dynamically creating the column names.