Postgres - Changing values within columns - sql

I have a query that returns a wide dataset with one row per student and multiple columns per 'score':
Student ID score1 score2 score3...
12345 101 102 103
67890 102 103 104
The scores are not actual scores, but instead are score ids that need to be translated to actual scores.
I would like to return the actual scores instead of the score ids. I know that I can just write a bunch of CASE statements that will do the translation for each column, but there are about 20 columns that need to be translated. I'm hoping that there is a more efficient way of doing this.
Cheers,
Jonathon

You probably want to make a scores table and then join to that. That will take away the need to write an absurd case query.
CREATE TABLE code_scores (
ScoreID INT
, Value INT)
GO
INSERT INTO code_scores (scoreid, value)
VALUES
(101, 100)
, (102, 99)
GO
SELECT studentID, score1, value
FROM yourtable
INNER JOIN code_scores
on score1 = scoreID

Related

Given a table of numbers, can I get all the rows which add up to less than or equal to a number?

Say I have a table with an incrementing id column and a random positive non zero number.
id
rand
1
12
2
5
3
99
4
87
Write a query to return the rows which add up to a given number.
A couple rules:
Rows must be "consumed" in order, even if a later row makes it a a perfect match. For example, querying for 104 would be a perfect match for rows 1, 2, and 4 but rows 1-3 would still be returned.
You can use a row partially if there is more available than is necessary to add up to whatever is leftover on the number E.g. rows 1, 2, and 3 would be returned if your max number is 50 because 12 + 5 + 33 equals 50 and 90 is a partial result.
If there are not enough rows to satisfy the amount, then return ALL the rows. E.g. in the above example a query for 1,000 would return rows 1-4. In other words, the sum of the rows should be less than or equal to the queried number.
It's possible for the answer to be "no this is not possible with SQL alone" and that's fine but I was just curious. This would be a trivial problem with a programming language but I was wondering what SQL provides out of the box to do something as a thought experiment and learning exercise.
You didn't mention which RDBMS, but assuming SQL Server:
DROP TABLE #t;
CREATE TABLE #t (id int, rand int);
INSERT INTO #t (id,rand)
VALUES (1,12),(2,5),(3,99),(4,87);
DECLARE #target int = 104;
WITH dat
AS
(
SELECT id, rand, SUM(rand) OVER (ORDER BY id) as runsum
FROM #t
),
dat2
as
(
SELECT id, rand
, runsum
, COALESCE(LAG(runsum,1) OVER (ORDER BY id),0) as prev_runsum
from dat
)
SELECT id, rand
FROM dat2
WHERE #target >= runsum
OR #target BETWEEN prev_runsum AND runsum;

Flaw in my logic of understanding the percentile() function in Hive

Apologies for the rather basic question, however I have been struggling to understand and find any useful examples for a problem I have using the percentile() function in Hive.
Let's say I have a basic table:
Name | ID | Salary
Tom 25 20,000
Jim 01 25,000
Larry 72 80,000
King 05 32,000
and I want a percentile value for each row (calculated using the Salary column).
What I've tried to use is
Select
Name,
ID,
Salary,
percentile(Salary, array(0.25, 0.5, 0.75)) as percentile_value
group by
Name,
ID,
Salary
however the output was the exact Salary values which have led me to believe that I have misunderstood how this function works. I was expecting something along the lines of
0.25
0.5
0.75
0.25
If someone can point me in the right direction or help me further understand this it would be very helpful.
I think its working fine. This is as per documentation -
This Returns the exact pth percentile (or percentiles p1, p2, ..) of a column in the group.
You are using Salary in the percentile and in the group by. Which is like you are issuing a command percentile(constant_value, array(0.25, 0.5, 0.75)) which will always return [constant_value,constant_value,constant_value].
As far as i know percentile will be on a range of values so your group should have multiple different values. Your sample data has all unique values so i created my own data and experimented. Let me know what you think :)
My code and data below. i inserted multiple values with same id to calculate proper percentiles.
create table tmp2(id int, name string, sal int);
insert into tmp2 values (25, 'Larry',55000);
insert into tmp2 values (25, 'Larry',5000);
insert into tmp2 values (25, 'Larry',125000);
insert into tmp2 values (5, 'Tim',125000);
Select id, percentile(sal, array(0.25, 0.5, 0.75)) as percentile_value from tmp2 group by id ;
Result -
id percentile_value
5 [125000.0,125000.0,125000.0]
25 [30000.0,55000.0,90000.0]

How to write a SQL query to calculate percentages based on values across different tables?

Suppose I have a database containing two tables, similar to below:
Table 1:
tweet_id tweet
1 Scrap the election results
2 The election was great!
3 Great stuff
Table 2:
politician tweet_id
TRUE 1
FALSE 2
FALSE 3
I'm trying to write a SQL query which returns the percentage of tweets that contain the word 'election' broken down by whether they were a politician or not.
So for instance here, the first 2 tweets in Table 1 contain the word election. By looking at Table 2, you can see that tweet_id 1 was written by a politician, whereas tweet_id 2 was written by a non-politician.
Hence, the result of the SQL query should return 50% for politicians and 50% for non-politicians (i.e. two tweets contained the word 'election', one by a politician and one by a non-politician).
Any ideas how to write this in SQL?
You could do this by creating one subquery to return all election tweets, and one subquery to return all election tweets by politicians, then join.
Here is a sample. Note that you may need to cast the totals to decimals before dividing (depending on which SQL provider you are working in).
select
politician_tweets.total / election_tweets.total
from
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%'
) election_tweets
join
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%' and
politician = 1
) politician_tweets
on 1 = 1
You can use aggregation like this:
select t2.politician, avg( case when t.tweet like '%election%' then 1.0 else 0 end) as election_ratio
from tweets t join
table2 t2
on t.tweet_id = t2.tweet_id
group by t2.politician;
Here is a db<>fiddle.

Project multiple rows into a single row based on columns values in sql

I would like to know how you project multiple related rows into a single row, for example, a product that comes in multiple parts will have multiple SKUs but I want to project the multiple parts into a single row.
I'm sure this is possible but struggling to define the query for the desired result.
Given the example dataset
I would like to project my result to the following
What ends up in the product code or product name columns is irrelevant, essentially I just need a single row to represent these two rows.
How would I achieve this?
It depends on the format of data stored in ProductCode and ProductName.
According to this, you have to write appropriate expressions extracting all the useful data.
Then, of course, you have to decide what ID you will leave for new rows.
In my example I do simple transformation with substr(…) to extract necessary data,
and I use max(ID) to choose what ID will be for the row.
Test data:
insert table1(CustId, ProductCode, ProductName)
values
(10, 'Prod1Part1', 'Product1 Part1'),
(10, 'Prod1Part2', 'Product1 Part2'),
(10, 'Prod1Part3', 'Product1 Part3'),
(10, 'Prod2Part1', 'Product2 Part1'),
(10, 'Prod2Part2', 'Product2 Part2')
;
A query:
SELECT
(SELECT
MAX(id)
FROM
table1
WHERE
SUBSTR(ProductCode, 1, 5) = NewProductCode) id,
CustId,
NewProductCode,
NewProductName
FROM
(SELECT DISTINCT
CustId, SUBSTR(ProductCode, 1, 5) NewProductCode,
substr(ProductName, 1, instr(ProductName, ' ')) NewProductName
FROM
table1) x
The output:
8 10 Prod1 Product1
10 10 Prod2 Product2
Is it clear? Ask me to improve the answer, if it's not.

How to retrieve data that is not in the same order as the query in SQL?

I am trying to retrieve a record from a table in SQL.
Here is what I want. For example:
I have a table name studentScore with two columns:
studentName ----- Scores
John Smith ----- 75,83, 96
I want to do this: When I type the score in a search box, I want it to show me the name of the student. For example: I could type "83, 96, 75", (the scores can be in any order) and this should show me the student name "John Smith". But I'm wondering how we could specify in the WHERE clause so that it picks up the correct record, if what we type in the box is not in the same order as the original data in the column?
Your issue is that your data is not properly normalized. You are putting a 1 to n relationship into a single table. If you'd reorganize your tables like such:
Table Students
id name
1 John Smith
Table Scores
studentId score
1 75
1 83
1 96
You could do a query like:
select st.name from Students st, Score sc where st.id = sc.studentId and sc.score in ("83", "75", "96")
This also helps if you want to do other queries, like find out which students have a score of at least X, which would be otherwise impossible with your existing table layout.
If you must stick with your existing layout, which I don't recommend, however you could split up the user input and then do a query like
select from studentScore where score like '%75%' or score like '%83%' or score like '%96%'
But i really would refrain from doing so.
I suppose it is solvable, but it would be simpler if the scores for each student were stored as separate rows, for example in a scores table. Otherwise, the code would have to permute the entry into every conceivable order. Or the scores entry would have to be in a standard order somehow.
If you do not want to create a new table for Scores, - e.g. with StudentId, Score columns -, you may sort the numbers before storing them.
This way, when someone types a query, you sort those numbers as well and just compare it to the stored strings.
If you need the original position of the scores, you can store those in a separate field.
Improve your database schema...this does not satisfy even the first normal form (http://en.wikipedia.org/wiki/Database_normalization#Normal_forms).
Improving the schema will save you plenty of headaches in the future (stemming from update anomalies).
No sql table should have multiple values for an attribute (in the same column). Are the scores stored as a string? If so, your query will be more complicated and you're wasting the point of the DB.
however, to your question:
SELECT col4, col3, col2 FROM students WHERE col1 = 57;
this will return columns 4, 3, and 2 in that order (4,3,2) even if they are saved in the order 1, 2, 3, 4. SQL returns the things you ask for in the order you ask for them.
So yeah, I agree with everyone else that this design is crap. If you were to normalize this table properly, you would be able to very easily get the data you need.
However, this is how you could do it with the current structure. Split the user input into discrete scores. Then, pass each value into the procedure.
CREATE PROCEDURE FindStudentByScores
(
#score1 AS VARCHAR(3) = NULL
,#score2 AS VARCHAR(3) = NULL
,#score3 AS VARCHAR(3) = NULL
)
AS
BEGIN
SELECT *
FROM [Students]
WHERE ( #score1 IS NULL
OR [Scores] LIKE '%' + #score1 + '%' )
AND ( #score2 IS NULL
OR [Scores] LIKE '%' + #score2 + '%' )
AND ( #score3 IS NULL
OR [Scores] LIKE '%' + #score3 + '%' )
END
You could use Regular expressions or the Like operator
A regexp solution could look like
SIMILAR TO '%(SCORE1|SCORE2|SCORE3)%'
That's the easiest way to go
but I recommend you changing your entire table structure as been mentioned now
a couple of times, since you have no possibility to take advantage of an index or key
which will exhaust the computer in matter of a couple tens of visitors
This is an example of where database normalization should help you a lot.
You could store your data like this
(Edit: if you want to keep the order you can add an order column)
studentName Scores Order
John Smith 75 1
John Smith 83 2
John Smith 96 3
Foo bar 73 1
Foo bar 34 2
........
But if you are stuck with the current model your next best option is to have the Scores column sorted, then you just need to take the search string from the textbox, sort and format it correctly, then you can search.
Lastly if the scores is not sorted in the table you can create all possible combinations
75, 83, 96
75, 96, 83
83, 75, 96
83, 96, 75
96, 75, 83
96, 83, 75
and search for them all with OR.