SQL query: please help weird output on SQL query. (Counting each client latest status) - sql

I used to ask this question in this SO and already got the solution, however, that sql query produced only partial correct result. I have tried to figure it out but it seem to complicate for me to understand. So would you please kindly help me. Thank you.
for table tbl_sActivity
act_id| Client_id | act_status| user_id | act_date
1 | 7 | warm | 1 | 19/7/12
2 | 7 | dealed | 1 | 30/7/12 <- lastest status of client(7)
3 | 8 | hot | 1 | 6/8/12 <- lastest status of client(8)
4 | 5 | cold | 22 | 7/8/12 <- lastest status of client(5)
5 | 6 | cold | 1 | 16/7/12
6 | 6 | warm | 1 | 18/7/12
7 | 6 | dealed | 1 | 7/8/12 <- lastest status of client(6)
8 | 9 | warm | 26 | 2/8/12
9 | 10 | warm | 26 | 2/8/12
10 | 9 | hot | 26 | 4/8/12 <- lastest status of client(9)
11 | 10 | hot | 26 | 4/8/12
12 | 10 | dealed | 26 | 10/8/12 <- lastest status of client(10)
13 | 13 | dealed | 26 | 8/8/12 <- lastest status of client(13)
14 | 11 | hot | 25 | 8/8/12
15 | 11 | dealed | 25 | 14/8/12 <- lastest status of client(11)
I want to produce the User Progressive Report that shows the latest progress of how each user follow up his/her clients.
Therefore this report must show number of each user's clients group by each of their latest status.
The correct output should be looked like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | dealed | 1
26 | hot | 1
26 | dealed | 2
The SQL query that I had is below:
select user_id, act_status, count(act_Status)
from your_table
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
group by user_id, act_status
which when i add more transactions to the database, the output went wrong like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | hot | 1 ** (this one shouldn't show up)
25 | dealed | 1
26 | hot | 2 ** (should be only '1')
26 | dealed | 2

Tried and tested:
SELECT S.user_id, S.act_status, Count(S.act_status) AS CountOfact_status
FROM tbl_sActivity AS S INNER JOIN [SELECT A.client_id, Max(A.act_date) AS max_act
FROM tbl_sActivity AS A
GROUP BY A.client_id]. AS S2 ON (S.act_date = S2.max_act) AND (S.client_id = S2.client_id)
GROUP BY S.user_id, S.act_status
A similar request to an answer I gave yesterday, please try and read through my query as well as just using it, this will hopefully help you understand and is the only way to learn : )

Will it not be better if you use the primary key column: act_id, instead of using the date column.
Thus, getting the max primary key?
What I meant was, instead of this:
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
something like this:
where act_id in (
select max(act_id)-- for each user, can't think of syntax now, but this could point you into the right direction
)
BugFinder's answer looks better.

select user_id, act_status, count(act_Status)
from your_table left join
(select act_date in (
select max(act_date)
from your_table
group by Client_id
) as t where t.client_id = your_table.client_id and t.act_date = your_table.act_date
group by user_id, act_status
Should give you what you wanted
Eg make a table of max dates by client, then link it back to your table using both fields to be precise.

Related

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution
You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.
I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

Select record with max value from each group with Query DSL

I have a score table where I have players scores, and I want select unique records for each player with the biggest score.
Here is the table:
id | player_id | score | ...
1 | 1 | 10 | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
5 | 3 | 2 | ...
Expected result:
id | player_id | score | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
I can achieve that with pure SQL like this:
SELECT *
FROM player_score ps
WHERE ps.score =
(
SELECT max(ps2.score)
FROM player_score ps2
WHERE ps2.player_id = ps.player_id
)
Can you tell me how to achieve the same query with query dsl? I found some solutions with JPASubQuery but this class doesn't work for me (my IDE cannot resolve this class). I am using querydsl 4.x. Thank you in advance.
JPASubQuery has been removed in querydsl 4. Instead use JPAExpressions.select. Your WHERE clause should look something like this:
.where(playerScore.score.eq(JPAExpressions.select(playerScore2.score.max())
.from(playerScore2))
.where(playerScore2.playerId.eq(playerScore.playerId)))

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.
You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

How to count distinct records

Could anybody please help me on SQL command?
I have a table (tbl_sActivity) that have below data:
user_id | client_id | act_status |
1 | 7 |
cold |
1 | 7 |
dealed |
22 | 5 |
cold |
1 | 6 |
cold |
1 | 6 |
warm |
1 | 6 |
hot |
1 | 6 |
dealed |
1 | 8 |
warm |
1 | 8 |
dealed |
21 | 4 |
warm |
21 | 4 |
dealed |
The out put should be
user_id | Count_C_id |
1 |
3 |
21 |
1 |
22 |
1 |
I've searched from net and learnt that MS ACCESS cannot use COUNT(DISTINCT) function. So I'm stuck at this stage for days.
Try this one. The "trick" is to have a subquery first to get all the distinct combinations of user and client IDs and then do the grouping per user:
SELECT
user_id
, COUNT(*) AS count_distinct_clients
FROM
( SELECT DISTINCT
user_id,
client_id
FROM tbl_sActivity
) AS tmp
GROUP BY
user_id ;
Recommendation is to make query without using sub-query.
Please find the below code which will be faster and accurate then subquery.
// Temp Table
CREATE TABLE #TempStudent(userId int, c_id int , Name varchar(MAX) )
SELECT max(userid) as UserId, count(c_id) as C_ID from #TempStudent
GROUP BY userId