Merge queries into one query - sql

I have the two following tables (with some sample datas)
LOGS:
ID | SETID | DATE
========================
1 | 1 | 2010-02-25
2 | 2 | 2010-02-25
3 | 1 | 2010-02-26
4 | 2 | 2010-02-26
5 | 1 | 2010-02-27
6 | 2 | 2010-02-27
7 | 1 | 2010-02-28
8 | 2 | 2010-02-28
9 | 1 | 2010-03-01
STATS:
ID | OBJECTID | FREQUENCY | STARTID | ENDID
=============================================
1 | 1 | 0.5 | 1 | 5
2 | 2 | 0.6 | 1 | 5
3 | 3 | 0.02 | 1 | 5
4 | 4 | 0.6 | 2 | 6
5 | 5 | 0.6 | 2 | 6
6 | 6 | 0.4 | 2 | 6
7 | 1 | 0.35 | 3 | 7
8 | 2 | 0.6 | 3 | 7
9 | 3 | 0.03 | 3 | 7
10 | 4 | 0.6 | 4 | 8
11 | 5 | 0.6 | 4 | 8
7 | 1 | 0.45 | 5 | 9
8 | 2 | 0.6 | 5 | 9
9 | 3 | 0.02 | 5 | 9
Every day new logs are analyzed on different sets of objects and stored in table LOGS.
Among other processes, some statistics are computed on the objects contained into these sets and the result are stored in table STATS. These statistic are computed through several logs (identified by the STARTID and ENDID columns).
So, what could be the SQL query that would give me the latest computed stats for all the objects with the corresponding log dates.
In the given example, the result rows would be:
OBJECTID | SETID | FREQUENCY | STARTDATE | ENDDATE
======================================================
1 | 1 | 0.45 | 2010-02-27 | 2010-03-01
2 | 1 | 0.6 | 2010-02-27 | 2010-03-01
3 | 1 | 0.02 | 2010-02-27 | 2010-03-01
4 | 2 | 0.6 | 2010-02-26 | 2010-02-28
5 | 2 | 0.6 | 2010-02-26 | 2010-02-28
So, the most recent stats for set 1 are computed with logs from feb 27 to march 1 whereas stats for set 2 are computed from feb 26 to feb 28.
object 6 is not in the results rows as there is no stat on it within the last period of time.
Last thing, I use MySQL.
Any Idea ?

Does this query fit to your question ?
SELECT objectid, l1.setid, frequency, l1.date as startdate, l2.date as enddate
FROM `logs` l1
INNER JOIN `stats` s ON (s.startid=l1.id)
INNER JOIN `logs` l2 ON (l2.id=s.endid)
INNER JOIN
(
SELECT setid, MAX(date) as date
FROM `logs` l
INNER JOIN `stats` s ON (s.startid=l.id)
GROUP BY setid
) d ON (d.setid=l1.setid and d.date=l1.date)
ORDER BY objectid

If there are no ties, you can use a filtering join. For example:
select stats.objectid
, stats.frequency
, startlog.setid
, startlog.date
, endlog.date
from stats
join logs startlog
on startlog.id = stats.startid
join logs endlog
on endlog.id = stats.endid
join (
select objectid, max(endlog.date) as maxenddate
from stats
join logs endlog
on endlog.id = stats.endid
group by objectid
) filter
on stats.objectid = filter.objectid
and filter.maxenddate = endlog.date
order by stats.objectid
Your example results appear to be slightly off, for example there is no row for objectid 5 where the frequency equals 0.35.

Related

SQL How to summarize integer/numeric values on different rows

I am trying to merge integer and numeric values from different SQL rows within the same table into one row so that they are summarized.
| ID | Count | Total Payment
1 | 1 | 5 | 10.99
2 | 1 | 3 | 4.86
3 | 2 | 8 | 19.88
4 | 2 | 2 | 15.99
5 | 2 | 5 | 8.45
6 | 3 | 4 | 12.98
7 | 3 | 10 | 40.42
As such I want to summarize the above rows into the below rows.
| ID | Count | Total Payment
1 | 1 | 8 | 15.85
2 | 2 | 15 | 44.32
3 | 3 | 14 | 53.40
How do I do this?
Thank you HonyBadger and Mathieu Guindon.
The correct code was:
SELECT [id], SUM([count]), SUM([total_payment])
FROM [table_name]
GROUP BY [id]
ORDER BY [count], [total_payment];

Display Rows if Column Value is repeated

I have a SQL table that looks like this:
DATA | TEST_ID | PARAM_ID
-------------------------------------
c:\desktop\image1| 11 | 1
c:\desktop\image2| 12 | 1
c:\desktop\image3| 13 | 1
c:\desktop\image4| 14 | 1
Fail | 14 | 2
0.45 | 14 | 3
c:\desktop\image5| 15 | 1
Fail | 15 | 2
0.68 | 15 | 3
c:\desktop\image6| 16 | 1
Fail | 16 | 2
0.25 | 16 | 3
I would like to create a query where the result only shows DATA if TEST_ID has the same value repeated 3 times.
Ideal Result:
DATA | TEST_ID | PARAM_ID
-------------------------------------
c:\desktop\image4| 14 | 1
Fail | 14 | 2
0.45 | 14 | 3
c:\desktop\image5| 15 | 1
Fail | 15 | 2
0.68 | 15 | 3
c:\desktop\image6| 16 | 1
Fail | 16 | 2
0.25 | 16 | 3
Would the best approach be to use COUNT(*)>2 for the TEST_ID column?
Use window functions:
select t.*
from (select t.*, count(*) over (partition by test_id) as cnt
from t
) t
where cnt >= 3;

How to sum 2 columns and add it with the previous summed columns in sql?

I have a table with these rows:
+------+--------+---------+---------+
| ID | Date | Amount1 | Amount2 |
+------+--------+---------+---------+
| 1 | 13 Nov | 8 | 3 |
| 2 | 11 Nov | 5 | 1 |
| 3 | 15 Nov | 0 | 3 |
| 4 | 18 Nov | 5 | 7 |
| 5 | 20 Nov | 10 | 0 |
+------+--------+---------+---------+
Would like to query with these result with the formula
Total = (Amount1 - Amount2) + Previous Row's Total
+------+--------+---------+---------+---------+
| ID | Date | Plus | Minus | Total |
+------+--------+---------+---------+---------+
| 2 | 11 Nov | 5 | 1 | 4 |
| 1 | 13 Nov | 8 | 3 | 9 |
| 3 | 15 Nov | 0 | 3 | 6 |
| 4 | 18 Nov | 5 | 7 | 4 |
| 5 | 20 Nov | 10 | 0 | 14 |
+------+--------+---------+---------+---------+
Is there any way to query this without binding the Total to a column on temporary table?
To get a running total, you can use SUM(columnname) OVER (ORDER BY sortedcolumnname).
To me it's actually a little counterintuitive compared to most windowed functions, as it doesn't have a partition but produces different results over the set of rows. However, it does work.
Here is some somewhat-obfuscated documentation from Microsoft about it.
I think you can therefore use
SELECT mt.[ID],
mt.[Date],
mt.[Amount1] AS [Plus],
mt.[Amount2] AS [Minus],
SUM(mt.[Amount1] - mt.[Amount2]) OVER (ORDER BY mt.[Date], mt.[ID]) AS Total
FROM mytable mt
ORDER BY mt.[Date],
mt.[ID];
And here are the results - they match yours.
ID Date Plus Minus Total
2 2020-11-11 5 1 4
1 2020-11-13 8 3 9
3 2020-11-15 0 3 6
4 2020-11-18 5 7 4
5 2020-11-20 10 0 14
Demo
You can acheive this using CTE first followed by self join. For amount1 - amount2, for id=3, you will be getting 0 -3 = -3. So, for id 3, the result below will be different for id=3
DECLARE #t table(id int, dateval date, amount1 int, amount2 int)
INSERT INTO #t
values
(1 ,'2020-11-13', 8, 3),
(2 ,'2020-11-11', 5, 1),
(3 ,'2020-11-15', 0, 3),
(4 ,'2020-11-18', 5, 7),
(5 ,'2020-11-20',10, 0);
;WITH CTE_First AS
(
SELECT id, dateval, amount1 as plus, amount2 as minus, (amount1-amount2) as total ,
ROW_NUMBER() OVER (ORDER BY dateval) as rnk
FROM #t
)
SELECT c.ID, c.DATEVAL, c.plus,c.minus,c.total + isnull(c1.total,0) as new_total
FROM CTE_First AS c
left outer join CTE_First AS C1
on C1.rnk = c.rnk- 1
+----+------------+------+-------+-----------+
| ID | DATEVAL | plus | minus | new_total |
+----+------------+------+-------+-----------+
| 2 | 2020-11-11 | 5 | 1 | 4 |
| 1 | 2020-11-13 | 8 | 3 | 9 |
| 3 | 2020-11-15 | 0 | 3 | 2 |
| 4 | 2020-11-18 | 5 | 7 | -5 |
| 5 | 2020-11-20 | 10 | 0 | 8 |
+----+------------+------+-------+-----------+

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+