there is a sql query problem in which i should find out which users have passed all of the tests.
here are the tables:
ex: nameoftable(nameofcolumns)
user(userid,password)
test(test#,creditpertrue,negperfalse,total,neededcredit)
{creditpertrue=points given per a right answer and negperfalse=points taken per a wrong answer
and total=is the total points of a test and neededcredit is the points to pass the test}
question(test#,q#,truechoice#)
user_test(userid,test#)
{this table shows that which user have taken which test(a user can only take a test once)}
user_test_question(uesrid,test#,q#,userchoice#)
{this table shows the choice of every user in every test and question# which may be wrong or right}
now the question is this:
find out the userid of users that have passed all of the tests.
a solution that came to my mind was this:
create 2 views like this:
view1:userid,test#,numberofrightchoices
and
view2:userid,test#,numberofwrongchoices
and then use a select on these 2views and use numberofrightchoices*creditpertrue-numberofwrongchoices*negperfalse>=neededcredit
is it possible?
I think that the query that you need to get a list of users who have passed all the tests would be this:
select userid,
from user_test
where userid not in (
Select
userid
from
(select t0.test#,t2.uesrid, neededcredit,
Sum(case when t1.truechoice# = userchoice# then t0.creditpertrue else -negperfalse end) as User_Test_Points
from test t0 left join
question t1 on (t0.test# = t1.test#) left join
user_test_question t2 on (t1.q# = t2.q#)
where t2.uesrid is not null
Group by t0.test#,t2.uesrid,neededcredit )
Where Test_Results.User_Test_Points < Test_Results.neededcredit )
Related
I have a table named Ticket Numbers, which (for this example) contain the columns:
Ticket_Number
Assigned_Group
Assigned_Group_Sequence_No
Reported_Date
Each ticket number could contain 4 rows, depending on how many times the ticket changed assigned groups. Some of these rows could contain an assigned group of "Desktop Support," but some may not. Here is an example:
Example of raw data
What I am trying to accomplish is to get the an output that contains any ticket numbers that contain 'Desktop Support', but also the assigned group of the max sequence number. Here is what I am trying to accomplish with SQL:
Queried Data
I'm trying to use SQL with the following query but have no clue what I'm doing wrong:
select ih.incident_number,ih.assigned_group, incident_history2.maxseq, incident_history2.assigned_group
from incident_history_public as ih
left join
(
select max(assigned_group_seq_no) maxseq, incident_number, assigned_group
from incident_history_public
group by incident_number, assigned_group
) incident_history2
on ih.incident_number = incident_history2.incident_number
and ih.assigned_group_seq_no = incident_history2.maxseq
where ih.ASSIGNED_GROUP LIKE '%DS%'
Does anyone know what I am doing wrong?
You might want to create a proper alias for incident_history. e.g.
from incident_history as incident_history1
and
on incident_history1.ticket_number = incident_history2.ticket_number
and incident_history1.assigned_group_seq_no = incident_history2.maxseq
In my humble opinion a first error could be that I don't see any column named "incident_history2.assigned_group".
I would try to use common table expression, to get only ticket number that contains "Desktop_support":
WITH desktop as (
SELECT distinct Ticket_Number
FROM incident_history
WHERE Assigned_Group = "Desktop Support"
),
Than an Inner Join of the result with your inner table to get ticket number and maxSeq, so in a second moment you can get also the "MAXGroup":
WITH tmp AS (
SELECT i2.Ticket_Number, i2.maxseq
FROM desktop D inner join
(SELECT Ticket_number, max(assigned_group_seq_no) as maxseq
FROM incident_history
GROUP BY ticket_number) as i2
ON D.Ticket_Number = i2.Ticket_Number
)
SELECT i.Ticket_Number, i.Assigned_Group as MAX_Group, T.maxseq, i.Reported_Date
FROM tmp T inner join incident_history i
ON T.Ticket_Number = i.Ticket_Number and i.assigned_group_seq_no = T.maxseq
I think there are several different method to resolve this question, but I really hope it's helpful for you!
For more information about Common Table Expression: https://www.essentialsql.com/introduction-common-table-expressions-ctes/
I'm trying to get the difference in time by user between the first step checkout and final purchase. This is my query:
SELECT transactionid1,MAX((t1.hit_moment1-t2.hit_moment2)) as diff_hits,MAX(t2.checkout_step_2) as day FROM ((SELECT clientId as client1_id,
hits_1.page.pagePath as page_event1,
hits_1.eventInfo.eventAction as action_event1,
hits_1.transaction.transactionId as transactionId1,
TIMESTAMP_SECONDS(visitStartTime) as checkout_step_1,
hits_1.hour as hour1,
hits_1.minute as minute1,
(hits_1.hour*60+hits_1.minute) as hit_moment1
from `616180.ga_sessions_*` ,
UNNEST(hits) as hits_1 where hits_1.page.pagePath like '%/buy1/suscription%' and hits_1.eventInfo.eventAction="Transaction" and hits_1.transaction.transactionId is not null)t1 INNER JOIN (SELECT clientId as client2_id,
hits_2.page.pagePath as page_event2,
hits_2.eventInfo.eventAction as action_event2,
TIMESTAMP_SECONDS(visitStartTime) as checkout_step_2,
hits_2.hour as hour2,
hits_2.minute as minute2,
(hits_2.hour*60+hits_2.minute) as hit_moment2
from `616180.ga_sessions_*` ,UNNEST(hits) as hits_2 where hits_2.page.pagePath like '%/buy4/suscription%' and hits_2.eventInfo.eventAction="Checkout" )t2 on t1.client1_id=t2.client2_id) where (t1.hit_moment1-t2.hit_moment2)>0 and (t1.hit_moment1-t2.hit_moment2)<180 group by transactionId1 order by transactionid1
Where pagePath contains /buy1/suscription represents the transaction event and pagePath equal to buy4/suscription represents the first checkout step. I get results, but many of them are extremely large periods of time. Have i made a mistake?
Thank you.
I don't fully follow what the sample data looks like or exactly the format your want for the result set.
That said, you can use aggregation to do the calculation you want. The following assumes that the checkout is after the transaction, but it gives the basic idea:
select s.transaction_id,
max(hit.hour * 60 + hit.minutes) - min(hit.hour * 60 + hit.minutes) as diff_minute
from `616180.ga_sessions_*` s cross join
unnest(s.hits) as hit
where (hit.page.pagePath like '%/buy1/suscription%' and
hit.eventInfo.eventAction = 'Transaction' or
) or
(hit.page.pagePath like '%/buy4/suscription%' and
hit.eventInfo.eventAction = 'Checkout'
)
group by s.transaction_id;
It's been a while since I've done SQL, but I have a rather pressing issue.
My db-layout is as following:
Now, starting from a Users.ID, I want to get all the Rounds the user has played. The user could be Hosts.HostID, Host.GuestID, or even both. Where he is both, it should not show op in the results.
The results I need from the Query are the Hosts.Name and all the fields of Rounds. In general what I want to do is display a list of all the Hosts (actually these are the Games) in which the user has participated, as a Host or as a Guest, along with perhaps a total score. When clicking on this, some dropdown will appear showing the individual round scores, words, ...
Now I was wondering whether this was possible in a single query. Of course I could do a query getting all the Hosts and then per Host a query for each Round, but that doesn't seem that performant. This is what I've come up with so far:
SELECT Rounds.ID, Rounds.GameID, Rounds.Round, Rounds.Score, Rounds.Word
, Hosts.ID, Hosts.HostID, Hosts.GuestID
FROM Rounds INNER JOIN Hosts
ON Rounds.GameID = Hosts.ID
INNER JOIN Users
ON Hosts.hostID = Users.ID
WHERE Users.ID = 5
The issue is however that it doesn't filter out where the user is both host AND guest, and I can't seem to Group it by Hosts.ID either.
Add Hosts.hostID <> Hosts.guestID to the where clause.
If you are using SQL Server 2005 or later version, you could modify your present query like this:
SELECT Rounds.ID, Rounds.GameID, Rounds.Round, Rounds.Score, Rounds.Word
, Hosts.ID, Hosts.HostID, Hosts.GuestID
FROM Rounds INNER JOIN Hosts
ON Rounds.GameID = Hosts.ID
CROSS APPLY
(
SELECT Hosts.HostID
UNION
SELECT Hosts.GuestID
) AS u (UserID)
WHERE u.UserID = 5
;
The CROSS APPLY clause would produce either one or two rows, depending on whether HostID and GuestID are equal. In either event, the WHERE condition would ultimately leave at most one. Thus, the above query would give you only the games (with all their rounds) where the specified user participated.
And from that query you could easily get to something like this:
SELECT Hosts.ID,
TotalScore = SUM(Rounds.Score)
FROM
...
GROUP BY
Hosts.ID
;
I have the following tables - simplified quite a bit
Table - Tests
Test
A
B
C
D
E
F
G
H
Table - TestHistory
Test Result Version
A Pass 1
A Fail 2
B Pass 2
C Fail 1
C Pass 2
D Fail 1
D Fail 2
E Fail 1
I want to get the list of tests that failed (or any status) the last time they ran. But, also the version that it was found in.
So, in the above example, I want this returned:
A Fail 2
D Fail 2
E Fail 1
I've tried a couple methods
select Test, LastResult = IsNull((Select Top 1 Result From TestHistory Where Test = Tests.Test order by Version desc), 'NOT_RUN')
from Tests
What this does, is gives me a list of all tests and then I have to go through and kick out the rows I don't want (i.e. isn't Fail). This also doesn't give me the Version it ran in.
I also tried this:
select Version, TH.Test, Result
from TestHistory as TH inner join Tests as T on TH.Test = T.Test
where Result = 'Fail'
But, then I get rows such as:
Test Result Version
C Fail 1
I don't want those because it's not the Last Result.
How can I restrict this to give me exactly what I need without a lot of data manipulation (or worse, more DB reads) after? Any help would be appreciated. Thanks!
I can't syntax check this, but it should be close:
SELECT
th.Test,
th.Result,
th.Version
FROM
TestHistory th
INNER JOIN
(
SELECT
MAX(Version) as MaxVersion,
Test
FROM
TestHistory
GROUP BY
Test
) sub ON sub.MaxVersion = th.Version AND sub.Test = th.Test
WHERE
th.result = 'Fail'
Explanation: First, in the subquery, you get the maximum version for the test. Then use a join to restrict the outer query to only return the results that match the test/version of the subquery.
Edit: forgot the WHERE clause--seems you only want rows where the most recent result is failure.
Edit based on the question in your comment:
This should give you the most recent failure, plus tests that have never run. Note that this will filter out tests that have run but have never failed (your data does not have any of these). I based this on my original query in the interest of time, but I would guess there is a more elegant way:
SELECT
t.Test,
outerSub.Result,
outerSub.Version
FROM
Test t
LEFT JOIN
(SELECT
th.Test,
th.Result,
th.Version
FROM
TestHistory th
INNER JOIN
(
SELECT
MAX(Version) as MaxVersion,
Test
FROM
TestHistory
GROUP BY
Test
) sub ON sub.MaxVersion = th.Version AND sub.Test = th.Test
) outerSub on outerSub.Test = t.Test
WHERE
outerSub.result = 'Fail' OR outerSub.Test IS NULL
Small correction can be added to the above solution.
In case when You need to receive the results in the Test order, the query can be transformed as below:
SELECT src.Test, src.Result, src.Version
FROM
(
SELECT th.Version, th.Test, th.Result,
ROW_NUMBER() over(partition by th.Test order by th.Version desc) as RowNum
FROM dbo.TestHistory as th
) src
WHERE src.RowNum = 1 and src.Result = 'Fail'
order by src.Test;
Among this, the query will return the set in the needed columns' order.
How about:
select th.* from testHistory th
where th.result = 'fail' -- this part, according to you, being optional
and th.version =
(select max(t.version) from testhistory t
where t.test = th.test);
I have a large table of events. Per user I want to count the occurence of type A events before the earliest type B event.
I am searching for an elegant query. Hive is used so I can't do subqueries
Timestamp Type User
... A X
... A X
... B X
... A X
... A X
... A Y
... A Y
... A Y
... B Y
... A Y
Wanted Result:
User Count_Type_A
X 2
Y 3
I could not get the "cut-off" timestamp by doing:
Select User, min(Timestamp)
Where Type=B
Group BY User;
But then how can I use that information inside the next query where I want to do something like:
SELECT User, count(Timestamp)
WHERE Type=A AND Timestamp<min(User.Timestamp_Type_B)
GROUP BY User;
My only idea so far are to determine the cut-off timestamps first and then do a join with all type A events and then select from the resulting table, but that feels wrong and would look ugly.
I'm also considering the possibility that this is the wrong type of problem/analysis for Hive and that I should consider hand-written map-reduce or pig instead.
Please help me by pointing in the right direction.
First Update:
In response to Cilvic's first comment to this answer, I've adjusted my query to the following based on workarounds suggested in the comments found at https://issues.apache.org/jira/browse/HIVE-556:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
CROSS JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
WHERE main.[Type] = 'A'
AND (sub.[User] = main.[User])
AND (main.[Timestamp] < sub.[First_B_TS])
GROUP BY main.[User]
Original:
Give this a shot:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
ON (sub.[User] = main.[User]) AND (main.[Timestamp] < sub.[First_B_TS])
WHERE main.[Type] = 'A'
GROUP BY main.[User]
I did my best to follow hive syntax. Let me know if you have any questions. I would like to know why you wish/need to avoid a subquery.
In general, I +1 coge.soft's solution. Here it is again for your reference:
SELECT [User], COUNT([Timestamp]) AS [Before_First_B_Count]
FROM [Dataset] main
JOIN (SELECT [User], min([Timestamp]) [First_B_TS] FROM [Dataset]
WHERE [Type] = 'B'
GROUP BY [User]) sub
ON (sub.[User] = main.[User]) AND (main.[Timestamp] < sub.[First_B_TS])
WHERE main.[Type] = 'A'
GROUP BY main.[User]
However, a couple things to note:
What happens when there are no B events? Assuming you would want to count all the A events per user in that case an inner join as specified in the solution wouldn't work since there would be no entry for that user in the sub table. You would need to change to a left outer join for that.
The solution also does 2 passes over the data - one to populate the sub table, other to join the sub table with the main table. Depending on your notion of performance and efficiency, there is an alternative where you could do this by a single pass of data. You can distribute the data by user using Hive's distribute by functionality and write a custom reducer that would do your count calculation in your favorite language using Hive's transform functionality.