Split a column based on a character in BigQuery - google-bigquery

I have a table as shown below on BigQuery
Name | Score
Tim | 63 > 89 > 90
James| 67 > 44
I want to split the Score column into N separate columns where N is the maximum score length in the entire table. I would like the table to be as follow.
Name| Score_1 | Score_2 | Score_3
Tim | 63 | 89 | 90
James| 67 | 44 | 0 or NA
I tried the Split command but I end up doing a new row for each Name-Score combination.

For BigQuery Standard SQL
Below is simple case and assumes you know in advance the expected max score length (3 in below example)
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
SELECT 'James', '67 > 44'
)
SELECT
name,
score[SAFE_OFFSET(0)] AS score_1,
score[SAFE_OFFSET(1)] AS score_2,
score[SAFE_OFFSET(2)] AS score_3
FROM (
SELECT name, SPLIT(score, ' > ') score
FROM `project.dataset.your_table`
)
with result
Row name score_1 score_2 score_3
1 Tim 63 89 90
2 James 67 44 null
Of course above approach means - if you have many scores - like 10 or 20 or more - you will need to add respective number of extra lines like below
score[SAFE_OFFSET(20)] AS score_21
So, above gives you what you wanted from schema of output point of view
At the same time, below makes more sense to me and in most practical cases is better and most optimal :
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
SELECT 'James', '67 > 44'
)
SELECT name, score
FROM `project.dataset.your_table`, UNNEST(SPLIT(score, ' > ')) score
with result
Row name score
1 Tim 63
2 Tim 89
3 Tim 90
4 James 67
5 James 44

Related

I want help in 2 queries : a)How many students have graduated with first class? b)How many students have obtained distinction?

The grading of the students based on the marks they have obtained is done as follows:
40-50 - Second class
50-60 - First Class
60-80 - First Class
80-100 - Distinctions.
The table Stud for the above query is given as:
ID|Name |Marks
11|Britney | 95
12|Dyana | 55
13|Jenny | 66
14|Christene| 88
15|Meera | 24
16|Priya | 76
17|Priyanka | 77
18|Paige | 74
19|Samantha | 87
21|Julia | 96
27|Evil | 79
29|Jane | 64
31|Scarlet | 80
32|Kristeen |100
34|Fanny | 75
37|Belvet | 78
38|Danny | 75
I've tried creating the grade table with assigning the grades first to the table with the following query:
Select from stud Grade=Case
when marks>100 then 'Distinction'
when 80>70 and marks<100 then 'Distinction'
when marks>60 and marks<80 then 'First Class'
when marks>50 and marks<60 then 'First Class'
when marks>40 and marks<50 then 'Second Class'
when marks<40 then 'Fail'
else 'No Grade Available' end Grade ;
Select count(*) res from stud where marks between 80 and 100
the result will have "res" with count of students who obtained marks between 80 and 100
I would use a sum of counters that are at 1 exactly when the mark value matches the conditions to be assigned to the group in question, using a CASE WHEN expression:
WITH
-- your input, don't use in real query ..
indata(ID,Name,Marks) AS (
SELECT 11,'Britney',95
UNION ALL SELECT 12,'Dyana',55
UNION ALL SELECT 13,'Jenny',66
UNION ALL SELECT 14,'Christene',88
UNION ALL SELECT 15,'Meera',24
UNION ALL SELECT 16,'Priya',76
UNION ALL SELECT 17,'Priyanka',77
UNION ALL SELECT 18,'Paige',74
UNION ALL SELECT 19,'Samantha',87
UNION ALL SELECT 21,'Julia',96
UNION ALL SELECT 27,'Evil',79
UNION ALL SELECT 29,'Jane',64
UNION ALL SELECT 31,'Scarlet',80
UNION ALL SELECT 32,'Kristeen',100
UNION ALL SELECT 34,'Fanny',75
UNION ALL SELECT 37,'Belvet',78
UNION ALL SELECT 38,'Danny',75
)
-- real query starts here
SELECT
SUM(CASE WHEN marks > 50 AND marks <= 80 THEN 1 END) AS first_class_count
, SUM(CASE WHEN marks > 80 THEN 1 END) AS distinction_count
FROM indata
-- out first_class_count | distinction_count
-- out -------------------+-------------------
-- out 11 | 5
You can defined the categories in your select statement and use it in the group by to get a more easy to read result. Also, make sure to define the boundaries correctly. Here is an example
SELECT CASE WHEN marks BETWEEN 81 AND 100 then 'Distinction'
WHEN marks BETWEEN 51 AND 80 then 'First Class'
WHEN marks BETWEEN 40 AND 50 then 'Second class'
ELSE 'No Grade Available'
END Grade,
COUNT(*) AS stud_count
FROM stud
WHERE marks > 50
GROUP BY CASE WHEN marks BETWEEN 81 AND 100 then 'Distinction'
WHEN marks BETWEEN 51 AND 80 then 'First Class'
WHEN marks BETWEEN 40 AND 50 then 'Second class'
ELSE 'No Grade Available'
END
SQL Fiddle

Row_Number Sybase SQL Anywhere change on multiple condition

I have a selection that returns
EMP DOC DATE
1 78 01/01
1 96 02/01
1 96 02/01
1 105 07/01
2 4 04/01
2 7 04/01
3 45 07/01
3 45 07/01
3 67 09/01
And i want to add a row number (il'l use it as a primary id) but i want it to change always when the "EMP" changes, and also won't change when the doc is same as previous one like:
EMP DOC DATE ID
1 78 01/01 1
1 96 02/01 2
1 96 02/01 2
1 105 07/01 3
2 4 04/01 1
2 7 04/01 2
3 45 07/01 1
3 45 07/01 1
3 67 09/01 2
In SQL Server I could use LAG to compare previous DOC but I can't seem to find a way into SYBASE SQL Anywhere, I'm using ROW_NUMBER to partitions by the "EMP", but it's not what I need.
SELECT EMP, DOC, DATE, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY EMP, DOC, DATE) ID -- <== THIS WILL CHANGE THE ROW NUMBER ON SAME DOC ON SAME EMP, SO WOULD NOT WORK.
Anyone have a direction for this?
You sem to want dense_rank():
select
emp,
doc,
date,
dense_rank() over(partition by emp order by date) id
from mytable
This numbers rows within groups having the same emp, and increments only when date changes, without gaps.
if performance is not a issue in your case, you can try sth. like:
SELECT tx.EMP, tx.DOC, tx.DATE, y.ID
FROM table_xxx tx
join y on tx.EMP = y.EMP and tx.DOC = y.DOC
(SELECT EMP, DOC, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY DOC) ID
FROM(SELECT EMP, DOC FROM table_xxx GROUP BY EMP, DOC)x)y

Select rows where value changed in column

Currently I have this table in sql database sorted by Account#.
Account# Charge_code PostingDate Balance
12345 35 1/18/2016 100
**12345 35 1/20/2016 200**
12345 61 1/23/2016 250
12345 61 1/22/2016 300
12222 41 1/20/2016 200
**12222 41 1/21/2016 250**
12222 42 1/23/2016 100
12222 42 1/25/2016 600
How do I select last row prior to the change in the charge_code column for each Account#. I highlighted the rows that I am trying to return.
The query should execute quickly with the table having tens of thousands of records.
In SQL Server 2012+, you would use lead():
select t.*
from (select t.*,
lead(charge_code) over (partition by account order by postingdate) as next_charge_code
from t
) t
where charge_code <> next_charge_code;
In earlier versions of SQL Server, you can do something similar with apply.

SQL Teradata query

I have a table abc which have many records with columns col1,col2,col3,
dept | name | marks |
science abc 50
science cvv 21
science cvv 22
maths def 60
maths abc 21
maths def 62
maths ddd 90
I need to order by dept and name with ranking as ddd- 1, cvv - 2, abc -3, else 4 then need to find out maximum mark of an individual. Expected result is
dept | name | marks |
science cvv 22
science abc 50
maths ddd 90
maths abc 21
maths def 62
. How may I do it.?
SELECT
dept,
name,
MAX(marks) AS mark
FROM
yourTable
GROUP BY
dept,
name
ORDER BY
CASE WHEN name = 'ddd' THEN 1
name = 'cvv' THEN 2
name = 'abc' THEN 3
ELSE 4 END
Or, preferably, have another table that includes the sorting order.
SELECT
yourTable.dept,
yourTable.name,
MAX(yourTable.marks) AS mark
FROM
yourTable
INNER JOIN
anotherTable
ON yourTable.name = anotherTable.name
GROUP BY
yourTable.dept,
youtTable.name
ORDER BY
anotherTable.sortingOrder
This should work:
SELECT Dept, Name, MAX(marks) AS mark
FROM yourTable
GROUP BY Dept, Name
ORDER BY CASE WHEN Name = 'ddd' THEN 1
WHEN Name = 'cvv' THEN 2
WHEN Name = 'ABC' THEN 3
ELSE 4 END

How do you select latest entry per column1 and per column2?

I'm fairly new to mysql and need a query I just can't figure out. Given a table like so:
emp cat date amt cum
44 e1 2009-01-01 1 1
44 e2 2009-01-02 2 2
44 e1 2009-01-03 3 4
44 e1 2009-01-07 5 9
44 e7 2009-01-04 5 5
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e7 2009-01-02 2 2
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I need to select the latest date transaction per 'emp' and per 'cat'. The above table would produce something like:
emp cat date amt cum
44 e1 2009-01-07 5 9
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I've tried something like:
select * from orders where emp=44 and category='e1' order by date desc limit 1;
select * from orders where emp=44 and category='e2' order by date desc limit 1;
....
but this doesn't feel right. Can anyone point me in the right direction?
This should work, but I haven't tested it.
SELECT orders.* FROM orders
INNER JOIN (
SELECT emp, cat, MAX(date) date
FROM orders
GROUP BY emp, cat
) criteria USING (emp, cat, date)
Basically, this uses a subquery to get the latest entry for each emp and cat, then joins that against the original table to get all the data for that order (since you can't GROUP BY amt and cum).
The answer given by #R.Bemrose should work, and here's another trick for comparison:
SELECT o1.*
FROM orders o1
LEFT OUTER JOIN orders o2
ON (o1.emp = o2.emp AND o1.cat = o2.cat AND o1.date < o2.date)
WHERE o2.emp IS NULL;
This assumes that the columns (emp, cat, date) comprise a candidate key. I.e. there can be only one date for a given pair of emp & cat.