questions in SQL Oracle

questions in SQL Oracle - sql

I'm trying to answer these questions but I couldn't and I need here
1) List the number of days that have elapsed since each student joined.
this what I did
Select FR_FIRSTNAME,
FR_LASTNAME,
trunc(sysdate - FR_DATEJOINED) / 7 DAYS
from alharbi_bandar5_FRESHMEN;
no rows selected
2) List the student names and city in upper case.
This what i did
Select FR_FIRSTNAME, FR_LASTNAME, CITY FROM alharbi_bandar5_FRESHMEN
where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%';
> where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%'
*
ERROR at line 2:
ORA-00909: invalid number of arguments
3) List the no and last name of the student(s) with the highest ACT score.
This what i did
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = MAX(ACT);
where ACT = MAX(ACT)
*
ERROR at line 2:
ORA-00934: group function is not allowed here
this is my table
FR_ FR_FIRSTNAME FR_LASTNAME FR_DATEJO ACT CITY
--- ------------------------------ ------------------------------ --------- ---------- ------------------------------
100 Mark Ramon 12-JUL-13 21 Florence
101 John Wright 13-JUN-13 31 Edgewood
102 Peter Sellers 06-JAN-13 30 Blue Ash
103 Eric Bates 14-MAY-13 24 Milford
104 Theresa Boyers 23-APR-13 22 Covingtion
105 Alex William 04-MAR-13 24 Edgewood
106 Eric Byrd 23-MAR-13 19 Alexandria
107 Steve Norris 21-DEC-12 21 Highland
108 Lisa Nkosi 13-FEB-13 33 Florence
109 Bradley Rego 21-FEB-12 29 Covington
110 Kathy Thomas 15-OCT-12 27 Milford
111 Catherine Jones 17-APR-13 34 Edgewood
112 Emily Hess 15-NOV-12 36 Highland
113 Josha Hunter 19-MAY-14 31 Florence

A lot of these questions have answers in the Oracle SQL reference and are mostly syntax issues.
1) trunc(sysdate - FR_DATEJOINED) / 7 DAYS
Oracle gies out the number of days in the units of difference, so sysdate - FR_DATEJOINED would gie you number of days, which could also involve fractional component (2.5 days for example, if it has been 2 days and 12 hours since the candidate joined). Trunc would get rid of the fractional component, but "/7" would convert the result into number of weeks instead. why are you doing this?
Either way, i don't believe this query is being fired against the table below, otherwise you'd not get zero rows as you are not filtering anything at all.
Check these out for more info on Oracle's date functions.
http://docs.oracle.com/cd/E17952_01/refman-5.1-en/date-and-time-functions.html
https://www.youtube.com/watch?v=H18UWBoHhHY
2) UPPER function accepts a column name or an expression, so if you need multiple columns. you'd need to use UPPER around each column.
3) For this example, you'll need to use a subquery to get the max value first and then use the query on top.
getting the max value
Select max(act) from alharbi_bandar5_FRESHME;
so, final query would be...
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = (select MAX(ACT) from alharbi_bandar5_FRESHME);
Or, you could use the oracle rank function..
select fr_no,
fr_last_name,
act
from (
select fr_no, fr_lastname, act,
rank () over (order by act desc) rnk
from alharbi_bandar5_FRESHME
) where rnk = 1

Related

What logic should be used to label customers (monthly) based on the categories they bought more often in the preceding 4 calendar months?

I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help

Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?

You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Compare same column in consecutive rows in same table with multiple ID's

I have a user request for a report and I’m too new to SQL programming to know how to approach it.
My user wants to know for each Staff ID what is the min, avg and max number of days between visits. What I don’t know how to figure out is the number of days between Visit 1 and Visit 2; Visit 2 and Visit 3, etc., for each Person ID. Some Person ID’s only have one visit, others (most) have multiple visits (up to 26). Here is a snapshot of some data (the full dataset is over 14k records):
PersonID VisitNo StaffID VisitDate
161 1 42344 06/19/2018
163 1 32987 05/14/2018
163 2 32987 09/17/2018
193 1 42344 04/09/2018
193 2 42344 07/18/2018
193 1 33865 07/18/2018
207 1 32987 10/10/2018
207 2 32987 11/05/2018
329 1 42344 04/15/2018
329 2 42344 05/23/2018
329 3 42344 06/10/2018
329 4 42344 07/18/2018
329 1 33865 06/30/2018
329 2 33865 09/14/2018
My research found a lot of references to comparing rows in the same table and I figured out how to compare one visit to the next for a single PersonID using a self join and datadiff, but how do I get from one PersonID to the next, or skip those PersonID’s with only 1 visit? Or a PersonID who has visits with multiple StaffId's?
Any ideas/suggestions are greatly appreciated as I have two requests that will benefit.

You can use analytic function LEAD (myvar,1) OVER ()
example from https://www.techonthenet.com/sql_server/functions/lead.php
SELECT dept_id, last_name, salary,
LEAD (salary,1) OVER (ORDER BY salary) AS next_highest_salary
FROM employees;

For the average number of days, you can just use aggregation:
select personid,
(datediff(day, min(visitdate), max(visitdate)) * 1.0 / nullif(count(*) - 1, 0)
from t
group by personid;
I used SQL Server syntax, but the same idea holds in any database. The average is the maximum minus the minimum divided by one less than the number of visits.

Access SQL - Select only the last sequence

I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !

You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs

How to generate a column with a series of numbers based on a min and max value

I have a table structured as so:
fake_id start end misc_data
------------------------------------------------------
1 101 105 ab
1 101 105 cd
1 101 105 ef
2 117 123 gh
2 117 123 ij
2 117 123 kl
2 117 123 mn
3 51 53 op
3 51 53 qr
Notice that the fake_id field is not really a primary key, but is repeated a number of times equal to the number of distinct odd numbers in the range specified by start and end. The real id for each record is one of the odd numbers in that range. I need to write a query that returns fake_id, misc_data, and another column that contains those odd numbers to produce a real id, as follows:
fake_id real_id misc_data
------------------------------------------
1 101 ab
1 103 cd
1 105 ef
2 117 gh
2 119 ij
2 121 kl
2 123 mn
3 51 op
3 53 qr
As far as I know, there is no guarantee that there will be no gaps in the sequence (for example, there might be no records for range 21-31). How do I tell the query (or procedure, but query is preferable) that for each record with a particular fake_id, it should return the next odd number between start and end?
Also, is there a way to make the values for misc_data belong to a particular real_id? Using the second table as an example, how could I tell the query that "ab" belongs to real_id 101 instead of 103?
Thanks in advance.

Guessing here that you plan to sort on misc_data:
SELECT "fake_id",
((ROW_NUMBER()OVER(PARTITION BY "start"
ORDER BY "misc_data")-1)*2)+"start" AS "real_id",
"misc_data"
FROM t
ORDER BY "misc_data";
http://www.sqlfiddle.com/#!4/ae23c/23

Apologies for not answering sooner or to the individual comments. #John Dewey, I believe when I tried your script it did not correctly keep the gaps between the start-end series, but I was motivated to learn more about the PARTITION keyword and I think I am more enlightened now.
Since this was for an ETL task, I ended up writing code to generate the real IDs in a loop on the extract (I guess it would also count as a transform) side.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas