Calculate time spend in a state using hive - hive

I am working on a hive query where I need to calculate the time spent by a user in a state.
The data looks something like this:
I am trying to get the output like this:
I tried the script like this:
select
case when a.oldstatus = Registered and b.newstatus in
('Active','Suspended','Reactive') then
b.stats_changets-a.statuschagets
... so on for each state
end status,
count(*) as WithinXRange
from
MyTable a,b
where a.id = b.id
group by event_dt
So the b.statuschgts - a.statuschgts gives us or no of days which in turn I use in query to put it in different buckets as shown in 2nd image.
Can you please help me with this query.

Related

Quick one on Big Query SQL-Ecommerce Data

I am trying to replicate the Google Analyitcs data in Big Query but couldnt do that.
Basically I am using Custom Dimension 40 (user subscription status)
but I am getting wrong numbers in BQ.
Can someone help me on this?
I am using this query but couldn't find it out the exact one.
SELECT
(SELECT value FROM hits.customDimensions where index=40) AS UserStatus,
COUNT(hits.transaction.transactionId) AS Unique_Purchases
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA, --new rollup
UNNEST(GA.hits) AS hits
WHERE
(SELECT value FROM hits.customDimensions where index=40) IN ("xx001","xxx002")
GROUP BY 1
I am getting this from big query which is wrong.
I have check out the dates also but dont know why its wrong.
Your question is rather unclear. But because you want something to be unique and numbers are mysteriously not what you want, I would suggest using COUNT(DISTINCT):
COUNT(DISTINCT hits.transaction.transactionId) AS Unique_Purchases
As far as I understand, you imported Google Analytics data into Bigquery and you are trying to group the custom dimension with index 40 and values ("xx001","xxx002") in order to know how many hit transactions were performed in function of these dimension values.
Replicating your scenario and trying to execute the query you posted, I got the following error.
However, I created a query that could help with your use-case. At first, it selects the transactionId and dimension values with the transactionId different from null and with index value equal to 40, then the grouping is done by the dimension value, filtered with values equals to "xx001"&"xxx002".
WITH tx AS (
SELECT
HIT.transaction.transactionId,
CD.value
FROM
`xxxxxxxxxxxxx.ga_sessions_2020*` AS GA,
UNNEST(GA.hits) AS HIT,
UNNEST(HIT.customDimensions) AS CD
WHERE
HIT.transaction.transactionId IS NOT NULL
AND
CD.index = 40
)
SELECT tx.value AS UserStatus, count(tx.transactionId) AS Unique_Purchases
FROM tx
WHERE tx.value IN ("xx001","xx002")
GROUP BY tx.value
For further details about the format and schema of the data that is imported into BigQuery, I found this document.

How to get an group data from sql?

Can anyone help me create chart like the one below? I'm using CFDB on wordpres. It is a simple form inputs counter.
I've figured out something like this:
SELECT month(FROM_UNIXTIME(`submit_time`)) as miesiac,
year(FROM_UNIXTIME(`submit_time`)) as rok,
`form_name`, `field_name`, `field_value`, `field_order`, `file`
FROM `wp_cf7dbplugin_submits`
WHERE year(FROM_UNIXTIME(`submit_time`)) = 2016
I would like to get final result like in the attachment.
Now I get something like this one:
enter image description here
In order to generate the data needed for a chart like the one in your example, you need to return the number of form submissions for each month and form type.
SELECT COUNT(sub.form_name) as total, sub.form_name, sub.miesiac
FROM (
SELECT DISTINCT `submit_time`, month(FROM_UNIXTIME(`submit_time`)) as miesiac,
`form_name`
FROM `wp_cf7dbplugin_submits`
WHERE year(FROM_UNIXTIME(`submit_time`)) = 2016 ) sub
GROUP BY sub.form_name, sub.miesiac
The sub-query identifies the distinct submissions (since each submission has multiple rows) and the main query counts the number of submissions for each form type per month. There's no need to include the year because it's already included in the WHERE statement.

How to use aggregate function to filter a dataset in ssrs 2008

I have a matrix in ssrs2008 like below:
GroupName Zone CompletedVolume
Cancer 1 7
Tunnel 1 10
Surgery 1 64
ComplatedVolume value is coming by a specific expression <<expr>>, which is equal to: [Max(CVolume)]
This matrix is filled by a stored procedure that I am not supposed to change if possible. What I need to do is that not to show the data whose CompletedVolume is <= 50. I tried to go to tablix properties and add a filter like [Max(Q9Volume)] >= 50, but when I try to run the report it says that aggregate functions cannot be used in dataset filters or data region filters. How can I fix this as easy as possible?
Note that adding a where clause in sql query would not solve this issue since there are many other tables use the same SP and they need the data where CompletedVolume <= 50. Any help would be appreciated.
EDIT: I am trying to have the max(Q9Volume) value on SP, but something happening I have never seen before. The query is like:
Select r.* from (select * from results1 union select * from results2) r
left outer join procedures p on r.pid = p.id
The interesting this is there are some columns I see that does not included by neither results1/results2 nor procedures tables when I run the query. For example, there is no column like Q9Volume in the tables (result1, result2 and procedures), however when I run the query I see the columns on the output! How is that possible?
You can set the Row hidden property to True when [Max(CVolume)] is less or equal than 50.
Select the row and go to Row Visibility
Select Show or Hide based on an expression option and use this expression:
=IIF(
Max(Fields!Q9Volume.Value)<=50,
True,False
)
It will show something like this:
Note maximum value for Cancer and Tunnel are 7 and 10 respectively, so
they will be hidden if you apply the above expression.
Let me know if this helps.

SQL query to produce a time x day grid from a list of timestamps?

Structures of my tables are as follow.
Table Name : timetable
timetable http://www.4shared.com/download/MYafV7-6ce/timetableTable.png
Table Name : slot_table
timetable http://www.4shared.com/download/9Lp_CBn2ba/slot_table.png
Table Name : instructor(this table is not required for this particular problem)
I want to show the resultant data in my android app in a timetable format somewhat like this:
random http://www.4shared.com/download/oAGiUXVAba/random.png
Question : What query i should write so that subjects of particular days with respective slots will be the result of the query?
1)The days should be in order like monday,tuesday,wednesday.
2)If monday has 2 subjects in 2 different slots then it should display like this :
Day 7:30-9:10AM 9:20-11:00AM
Monday Android Workshop Operating System
This is just a sample.
P.S:As timetable format is required,all the subjects with slot ids of all the days(monday to saturday) must be there in it.
Edit :
I tried this
select day,subject,slot from timetable,slot_table where timetable.slotid = slot_table.slotid
which gave a result :
a http://www.4shared.com/download/uMU7NA8Oce/random1.png
But i want it in a timetable format which i am not having an idea how to do that.
Edit :
Timetable sample format is something like this :
Edit :
I wrote a query
select timetable.day,count(slot_table.subject) as no_of_classes from timetable,slot_table where timetable.slotid = slot_table.slotid group by timetable.day
which resulted in
a http://www.4shared.com/download/rZW20_g8ce/random2.png
So now it shows monday has 2 classes in 2 slots,Tuesday has 1 class in 1 slot and so on.
Now any help on a query which can show the two slots(timings) on monday?
Solution :
select timetable.day,max(case when (slot='7:30-9:10AM') then slot_table.subject END) as "7:30-9:10AM",max(case when (slot='9:20-11:00AM') then slot_table.subject END) as "9:20-11:00AM",max(case when (slot='11:10-12:50PM') then slot_table.subject END) as "11:10-12:50PM",max(case when (slot='1:40-3:20PM') then slot_table.subject END) as "1:40-3:20PM", max(case when (slot='3:30-5:00PM') then slot_table.subject END) as "3:30-5:00PM" from timetable join slot_table on timetable.slotid = slot_table.slotid group by timetable.day
Result :
a http://www.4shared.com/download/1w7Tyicfce/random3.png
What you want is called a PIVOT query. In one of these, you have a select which gives the data in rows, like your result just under the EDIT (Day, subject, slot). Then you need to specify the values of the row you want to 'pivot' to become columns (slot in this example). Because a Pivot relies on the values of the column to be pivoted it can be difficult to write a general query, and the Postgres Wiki has an example using dymanic SQL and lots of code generating it at http://wiki.postgresql.org/wiki/Pivot_query
In your case, given that slots look like they're fixed and you might be able to hard-code them (that's a decision you'll have make yourself).
NB I am not a Postgres user, but it looks like it can do it (and I would have been very surprised if it couldn't).
This is a pivot or crosstab query. PostgreSQL has only limited support for these via the crosstab function in the tablefunc module.
It can sometimes be better to just deal with this in the application, accumulating the data into a table as you read each data point.

Optimize annotated query in Django

I'm trying to convert this simple SQL query into something Django can handle:
SELECT *
FROM location AS a
WHERE a.travel_distance = (
SELECT MAX(travel_distance)
FROM location AS b
WHERE b.person_id = a.person_id
)
ORDER BY a.travel_distance DESC
What this basically does is fetching all traveled locations and select only the rows that contain the maximum travel distance.
This is what i got so far:
travels = Location.objects.filter(pk__in=Location.objects.order_by().values('person_id').annotate(max_id=Max('id')).values('max_id')).order_by('travel_distance')[::-1]
Although the results match each other. It takes a whole lot longer for the second method to return results.
Is there anyway I can rewrite this query, so it becomes faster?
If I understand correctly you want the maximum distance travelled for each person. Assuming there is a Person model, perhaps ask from the other direction. Something like:
Person.objects.values('id').annotate(max_distance=Max('location__travel_distance'))
I haven't tested this since I don't have an equivalent data schema handy, but does this work for you?
Isn't this works? Something like:
select max(id), sum(travel_distance) from table group by person_id;