I have a table like this:
| id | name | segment | date_created | question | answer |
|----|------|---------|--------------|----------|--------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 |
| 1 | John | 1 | 2018-01-01 | 14 | 37 |
| 1 | John | 1 | 2018-01-01 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 |
| 2 | Jack | 3 | 2018-03-11 | 23 | 16 |
And I want to show this information in a single row, transpose all the questions and answers as columns:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 10 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | | |
The number os questions/answers for the same ID is known. Maximum of 15.
I'm already tried using crosstab, but it only accepts a single value as category and I can have 2 (question/answer). Any help how to solve this?
You can try to use row_number to make a number in subquery then, do Aggregate function condition in the main query.
SELECT ID,
Name,
segment,
date_created,
max(CASE WHEN rn = 1 THEN question END) question_01 ,
max(CASE WHEN rn = 1 THEN answer END) answer_01 ,
max(CASE WHEN rn = 2 THEN question END) question_02,
max(CASE WHEN rn = 2 THEN answer END) answer_02,
max(CASE WHEN rn = 3 THEN question END) question_03,
max(CASE WHEN rn = 3 THEN answer END) answer_03
FROM (
select *,Row_number() over(partition by ID,Name,segment,date_created order by (select 1)) rn
from T
) t1
GROUP BY ID,Name,segment,date_created
sqlfiddle
[Results]:
| id | name | segment | date_created | question_01 | answer_01 | question_02 | answer_02 | question_03 | answer_03 |
|----|------|---------|--------------|-------------|-----------|-------------|-----------|-------------|-----------|
| 1 | John | 1 | 2018-01-01 | 1 | 28 | 14 | 37 | 9 | 83 |
| 2 | Jack | 3 | 2018-03-11 | 22 | 13 | 23 | 16 | (null) | (null) |
I have the following table from which I try to extract all cust_id who have bought an item for the first time in January.
I found a way with MySQL but I'm working with Hive and it doesn't work
Consider this table:
| cust_id | created | year | month | item |
|---------|---------------------|------|-------|------|
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | ABC |
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | DEF |
| 100 | 2017-01-08 22:45:00 | 2017 | 01 | GHI |
| 100 | 2017-08-03 08:01:00 | 2017 | 08 | JKL |
| 100 | 2017-01-01 21:23:00 | 2017 | 01 | MNO |
| 130 | 2016-12-06 06:42:00 | 2016 | 12 | PQR |
| 140 | 2017-01-21 15:01:00 | 2017 | 01 | STU |
| 130 | 2017-01-29 13:20:00 | 2017 | 01 | VWX |
| 140 | 2017-04-10 09:15:00 | 2017 | 04 | YZZ |
With the following query, it works:
SELECT
cust_id,
year,
month,
MIN(STR_TO_DATE(created, '%Y-%m-%d %H:%i:%s')) AS min_date
FROM
t1
GROUP BY
cust_id
HAVING
year = '2017'
AND
month= '01'
And it returns this table:
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
But in Hive, I cannot filter the fields year and month with HAVING if they have not been grouped by previously. In other words, the previous query fails.
Instead, the following runs but don't produce the expected result:
SELECT
cust_id,
year,
month,
MIN(unix_timestamp(created, 'yyyy-MM-dd HH:mm:ss')) AS min_date
FROM
t1
GROUP BY
cust_id, year, month
HAVING
year = '2017'
AND
month= '01'
cust_id 130 shows up even if the first purchase happened in december 2016
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 130 | 2017 | 01 | 2017-01-29 13:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
Here is the fiddle : SQL fiddle
Thank you
Your MySQL query doesn't really work, even if it runs. Never have "bare" columns in the group by or having or order by (of an aggregation query). All non-aggregated columns should be the arguments to an aggregation function. In your case, year and month fall into this category.
What you appear to want in either database is something like this:
SELECT cust_id
FROM t1
GROUP BY cust_id
HAVING MIN(created) >= '2017-01-01' AND
MIN(created) < '2017-02-01';
I'm trying to get a rank over groups of integers when ordered by date. Some of the groups will have the same value but be separated by other groups. For this reason I can't use DENSE_RANK as is puts the integers of the same value together. The values of 10 below would belong to the same group in DENSE_RANK, I would like them to be in ranked group 2 & 4. Thanks for any help.
| ID | Date | IntValue | DesiredRankResult |
| 1 | 01 Jan | 10 | 4 |
| 1 | 02 Jan | 10 | 4 |
| 1 | 03 Jan | 20 | 3 |
| 1 | 04 Jan | 20 | 3 |
| 1 | 05 Jan | 10 | 2 |
| 1 | 06 Jan | 10 | 2 |
| 1 | 07 Jan | 30 | 1 |
You can do this with using lead() and a cumulative sum. I think it looks like this:
select t.*,
sum(case when next_intval = intval then 0 else 1 end) over (partition by id order by date desc) as DesiredRankResult
from (select t.*,
lead(intval) over (partition by id order by date) as next_intval
from t
) t;
I want to select in one row the value of a column that appears in multiple rows, I have the table Solution:
| StudentID | SolutionDate | SolutionTime | SongID |
----------------------------------------------------
| 0824616 | 2015-09-20 | 00:07:00 | 01 |
| 0824616 | 2015-09-20 | 00:05:00 | 02 |
| 0824616 | 2015-09-21 | 00:07:40 | 01 |
| 0824616 | 2015-09-21 | 00:10:00 | 03 |
| 0824616 | 2015-09-23 | 00:04:30 | 03 |
| 0824616 | 2015-09-23 | 00:11:30 | 03 |
I want to group the records by StudentID and SongID.
The expected output is:
| StudentID | SongID | TimeA | TimeB | TimeC |
-------------------------------------------------------
| 0824616 | 01 | 00:07:00 | 00:07:40 | NULL |
| 0824616 | 02 | 00:05:00 | NULL | NULL |
| 0824616 | 03 | 00:10:00 | 00:04:30 | 00:11:30 |
There are 3 records by StudentID-SongID at the most. I'm using SQL Server 2012.
Try with window function first to number the rows and then use conditional aggregation:
;with cte as(select *, row_number() over(partition by studentid, songid
order by solutiondate, solutiontime) rn from tablename)
select studentid,
songid,
max(case when rn = 1 then solutiontime end) as timea,
max(case when rn = 2 then solutiontime end) as timeb,
max(case when rn = 3 then solutiontime end) as timec
from cte
group by studentid, songid
I have the following table.
Data_table
R_id I_id Metric CType Timespan Quantity Date
1 1 S C Week 100 4/5/2015
1 1 Q C Week 200 4/5/2015
1 1 I D Week 80 4/5/2015
1 2 S C Week 150 4/5/2015
1 2 Q C Week 100 4/5/2015
1 2 I D Week 50 4/5/2015
Metric can have a limited set of values (S, Q, I..)
CType will be C, D or nil.
Timespan can be Weekly/Daily.
Date will be a Sunday (start of week) for Weekly and that day's date for Daily.
My goal is to convert this to a daily view which would involve
If Timespan is Daily, copy the Quantity for the above metrics as it is.
Converting a Weekly quantity to 7 Daily quantities.
If the CType is D copy the quantity as it is.
If the CType is C use a constant percentage breakdown logic to distribute the weekly over 7 days.eg [30%, 10%, 10%, 5%, 10%, 15% 20%] = 100%
Creating the following VIEW.
R_id I_id Date S Q I ... (other metrics whose CType is not nil)
1 1 4/5/2015 30 60 80 ... (the quantity of the other metrics)
1 1 4/6/2015 10 20 80
1 1 4/7/2015 10 20 80
1 1 4/8/2015 5 10 80
1 1 4/9/2015 10 20 80
1 1 4/10/2015 15 30 80
1 1 4/11/2015 20 40 80
1 2 4/5/2015 45 30 50
1 2 4/6/2015 15 10 50
1 2 4/7/2015 15 10 50
1 2 4/8/2015 7.5 5 50
1 2 4/9/2015 15 10 50
1 2 4/10/2015 22.5 15 50
1 2 4/11/2015 30 20 50
I can write a bunch of java methods which will pull out the data from the above table and get the values for metrics as needed. But for a large dataset, the performance will not be very good. Databases are meant for this type of data computation. Once this view is created, I can quickly (and simply) query it to get what I want. I can write simple sql queries. But I have no clue how to even begin approaching this problem! I can see a PIVOT here (logically, I don't know how a query would or even can achieve it). But how to compute the 7 daily quantities from a weekly quantity and put it in the VIEW?
Suggestions and guidance will be much appreciated.
You can use hierarchical queries to generate daily data.
SQL Fiddle
Query:
select
r_id,
i_id,
metric,
ctype,
timespan,
quantity,
tdate + level - 1 as m_tdate,
level as m_level,
(case ctype
when 'C' then
(case level
when 1 then 0.3
when 2 then 0.1
when 3 then 0.1
when 4 then 0.05
when 5 then 0.1
when 6 then 0.15
when 7 then 0.2
end)
else 1
end) * quantity as m_quantity
from myt
where timespan = 'Week'
connect by level <= 7
and r_id = prior r_id
and i_id = prior i_id
and metric = prior metric
and ctype = prior ctype
and timespan = prior timespan
and prior sys_guid() is not null
This will generate seven day data for each record
Results:
| R_ID | I_ID | METRIC | CTYPE | TIMESPAN | QUANTITY | M_TDATE | M_LEVEL | M_QUANTITY |
|------|------|--------|-------|----------|----------|-----------------------|---------|------------|
| 1 | 1 | I | D | Week | 80 | May, 04 2015 00:00:00 | 1 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 05 2015 00:00:00 | 2 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 06 2015 00:00:00 | 3 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 07 2015 00:00:00 | 4 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 08 2015 00:00:00 | 5 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 09 2015 00:00:00 | 6 | 80 |
| 1 | 1 | I | D | Week | 80 | May, 10 2015 00:00:00 | 7 | 80 |
| 1 | 1 | Q | C | Week | 200 | May, 04 2015 00:00:00 | 1 | 60 |
| 1 | 1 | Q | C | Week | 200 | May, 05 2015 00:00:00 | 2 | 20 |
| 1 | 1 | Q | C | Week | 200 | May, 06 2015 00:00:00 | 3 | 20 |
| 1 | 1 | Q | C | Week | 200 | May, 07 2015 00:00:00 | 4 | 10 |
| 1 | 1 | Q | C | Week | 200 | May, 08 2015 00:00:00 | 5 | 20 |
| 1 | 1 | Q | C | Week | 200 | May, 09 2015 00:00:00 | 6 | 30 |
| 1 | 1 | Q | C | Week | 200 | May, 10 2015 00:00:00 | 7 | 40 |
| 1 | 1 | S | C | Week | 100 | May, 04 2015 00:00:00 | 1 | 30 |
| 1 | 1 | S | C | Week | 100 | May, 05 2015 00:00:00 | 2 | 10 |
| 1 | 1 | S | C | Week | 100 | May, 06 2015 00:00:00 | 3 | 10 |
| 1 | 1 | S | C | Week | 100 | May, 07 2015 00:00:00 | 4 | 5 |
| 1 | 1 | S | C | Week | 100 | May, 08 2015 00:00:00 | 5 | 10 |
| 1 | 1 | S | C | Week | 100 | May, 09 2015 00:00:00 | 6 | 15 |
| 1 | 1 | S | C | Week | 100 | May, 10 2015 00:00:00 | 7 | 20 |
| 1 | 2 | I | D | Week | 50 | May, 04 2015 00:00:00 | 1 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 05 2015 00:00:00 | 2 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 06 2015 00:00:00 | 3 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 07 2015 00:00:00 | 4 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 08 2015 00:00:00 | 5 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 09 2015 00:00:00 | 6 | 50 |
| 1 | 2 | I | D | Week | 50 | May, 10 2015 00:00:00 | 7 | 50 |
| 1 | 2 | Q | C | Week | 100 | May, 04 2015 00:00:00 | 1 | 30 |
| 1 | 2 | Q | C | Week | 100 | May, 05 2015 00:00:00 | 2 | 10 |
| 1 | 2 | Q | C | Week | 100 | May, 06 2015 00:00:00 | 3 | 10 |
| 1 | 2 | Q | C | Week | 100 | May, 07 2015 00:00:00 | 4 | 5 |
| 1 | 2 | Q | C | Week | 100 | May, 08 2015 00:00:00 | 5 | 10 |
| 1 | 2 | Q | C | Week | 100 | May, 09 2015 00:00:00 | 6 | 15 |
| 1 | 2 | Q | C | Week | 100 | May, 10 2015 00:00:00 | 7 | 20 |
| 1 | 2 | S | C | Week | 150 | May, 04 2015 00:00:00 | 1 | 45 |
| 1 | 2 | S | C | Week | 150 | May, 05 2015 00:00:00 | 2 | 15 |
| 1 | 2 | S | C | Week | 150 | May, 06 2015 00:00:00 | 3 | 15 |
| 1 | 2 | S | C | Week | 150 | May, 07 2015 00:00:00 | 4 | 7.5 |
| 1 | 2 | S | C | Week | 150 | May, 08 2015 00:00:00 | 5 | 15 |
| 1 | 2 | S | C | Week | 150 | May, 09 2015 00:00:00 | 6 | 22.5 |
| 1 | 2 | S | C | Week | 150 | May, 10 2015 00:00:00 | 7 | 30 |
Once you have this, you need to pivot the result, which can be done by simple GROUP BY
Query:
with x as (
select
r_id,
i_id,
metric,
ctype,
timespan,
quantity,
tdate + level - 1 as m_tdate,
level as m_level,
(case ctype
when 'C' then
(case level
when 1 then 0.3
when 2 then 0.1
when 3 then 0.1
when 4 then 0.05
when 5 then 0.1
when 6 then 0.15
when 7 then 0.2
end)
else 1
end) * quantity as m_quantity
from myt
where timespan = 'Week'
connect by level <= 7
and r_id = prior r_id
and i_id = prior i_id
and metric = prior metric
and ctype = prior ctype
and timespan = prior timespan
and prior sys_guid() is not null
UNION ALL
select
r_id,
i_id,
metric,
ctype,
timespan,
quantity,
tdate as m_tdate,
1 as m_level,
quantity as m_quantity
from myt
where timespan = 'Day'
)
select
r_id,
i_id,
m_tdate,
sum(case when metric = 'S' then m_quantity end) S,
sum(case when metric = 'Q' then m_quantity end) Q,
sum(case when metric = 'I' then m_quantity end) I
from x
group by
r_id,
i_id,
m_tdate
order by
r_id,
i_id,
m_tdate
Results:
| R_ID | I_ID | M_TDATE | S | Q | I |
|------|------|-------------------------|--------|--------|-----|
| 1 | 1 | May, 04 2015 00:00:00 | 30 | 60 | 80 |
| 1 | 1 | May, 05 2015 00:00:00 | 10 | 20 | 80 |
| 1 | 1 | May, 06 2015 00:00:00 | 10 | 20 | 80 |
| 1 | 1 | May, 07 2015 00:00:00 | 5 | 10 | 80 |
| 1 | 1 | May, 08 2015 00:00:00 | 10 | 20 | 80 |
| 1 | 1 | May, 09 2015 00:00:00 | 15 | 30 | 80 |
| 1 | 1 | May, 10 2015 00:00:00 | 20 | 40 | 80 |
| 1 | 2 | April, 03 2015 00:00:00 | (null) | (null) | 120 |
| 1 | 2 | May, 04 2015 00:00:00 | 45 | 30 | 50 |
| 1 | 2 | May, 05 2015 00:00:00 | 15 | 10 | 50 |
| 1 | 2 | May, 06 2015 00:00:00 | 15 | 10 | 50 |
| 1 | 2 | May, 07 2015 00:00:00 | 7.5 | 5 | 50 |
| 1 | 2 | May, 08 2015 00:00:00 | 15 | 10 | 50 |
| 1 | 2 | May, 09 2015 00:00:00 | 22.5 | 15 | 50 |
| 1 | 2 | May, 10 2015 00:00:00 | 30 | 20 | 50 |