Finding all events happening at a specific hour - sql

I have a database of events and I want to make a daily schedule of it.
It looks something as following:
+-----+-----+---+--------+
|Event|Start|End|Duration|
+-----+-----+---+--------+
|A |08 |10 |2 |
+-----+-----+---+--------+
|B |09 |10 |1 |
+-----+-----+---+--------+
|C |13 |15 |2 |
+-----+-----+---+--------+
I want to query for all events that are held at 9 and I can't figure the math behind calculating the time.
The query should return A and B for this example. I tried:
start + duration > 9 and start <=9 but it isn't correct...
Any help please?

What you want is this where clause:
where start <= 9 and end > 9
That is, something happens at 9 if it starts one or before 9 and ends after 9. (If you want things that end at 9 to be included, just change the > to >=).
I notice that you have leading zeros. This suggests that the values are stored as strings. In that case, do string comparisons:
where start <= '09' and end > '09'

Related

SQL: Pushing a Dataset "Backwards"?

I have the following dataset that contains information on different students who took a fitness test, the date they took the fitness test, their weight at the time of the fitness test, and whether or not they passed the fitness test :
name date_test_taken result_of_test weight_after_time_of_test
1 john 2013-01-01 pass 165
2 john 2016-01-01 fail 183
3 john 2017-01-01 fail 175
4 john 2020-01-01 pass 182
5 alex 2019-01-01 fail 220
6 alex 2020-01-01 fail 225
7 tim 2018-01-01 pass 176
In this example, the student participates in the fitness test, then is told if they passed or failed, and then the student records their weight. The students don't necessarily take the test every year.
I am interested in building a statistical/machine learning model that will predict whether the student will pass or fail the NEXT fitness test they take based on the CURRENT weight of the student AND the result of their last fitness test.
This means if you take the second row of this dataset - John weighed 183 lbs after his second test, but his last known weight was actually 165 lbs. Therefore, I would be interested in "shifting" the dataset backward for each student. I am interested in predicting if John would have passed his second fitness test when his last known weight was 165 lbs and not 183 lbs.
Thus, using SQL code, I would like to "shift" the data for each student backward to modify the dataset. This way, the teacher can predict who will fail the next fitness test based on the results of the current fitness test - and then help those students more throughout the year.
Can someone please show me how to do this?
Thanks!
Obviously, with the information provided, we won't be able to provide you with the model to predict results. However, we can help with providing the data you need for your modelling.
The key task here is to get an individual's previous results into the same row (for analysis) as their current results.
SQL (at least SQL Server, and multiple other products) have functions LEAD and LAG that allow you to 'look forward' (lead) or backward (lag) through the dataset according to a given sorting and partitioning mechanism that tells it how to identify the previous/next rows.
In this case, we want to partition by the individual (name), take their previous result (LAG function, 1 row back) ordered by the date they took the test.
The following SQL gets the previous results for an individual onto the same row as the current data (note - I'm assuming the data table is called #FT and the first column is called 'Auto_ID'):
SELECT [name],
[date_test_taken],
LAG([date_test_taken], 1) OVER (PARTITION BY [name] ORDER BY [date_test_taken], [Auto_Id]) AS [date_test_taken_Previous],
LAG([result_of_test], 1) OVER (PARTITION BY [name] ORDER BY [date_test_taken], [Auto_Id]) AS [result_of_test_Previous],
LAG([weight_after_time_of_test], 1) OVER (PARTITION BY [name] ORDER BY [date_test_taken], [Auto_Id]) AS [weight_after_time_of_test_Previous],
[result_of_test],
[weight_after_time_of_test]
FROM #FT
Note that if they don't have a previous record, the previous results are NULL.
Here are the results
|name|date_test_taken|date_test_taken_Previous|result_of_test_Previous|weight_after_time_of_test_Previous|result_of_test|weight_after_time_of_test|
|----|---------------|------------------------|-----------------------|----------------------------------|----------------------------------------|
|alex|2019-01-01 |NULL |NULL |NULL |fail |220.00 |
|alex|2020-01-01 |2019-01-01 |fail |220.00 |fail |225.00 |
|john|2013-01-01 |NULL |NULL |NULL |pass |165.00 |
|john|2016-01-01 |2013-01-01 |pass |165.00 |fail |183.00 |
|john|2017-01-01 |2016-01-01 |fail |183.00 |fail |175.00 |
|john|2020-01-01 |2017-01-01 |fail |175.00 |pass |182.00 |
|tim |2018-01-01 |NULL |NULL |NULL |pass |176.00 |
To see it in action, here is a dbfiddle with the data, query, and results.

Groupby with when condition in Pyspark

My data frame looks like
id |reg_date | txn_date|
+----------+----------+--------------------+
|1 |2019-01-06| 2019-02-15 12:51:15|
|1 |2019-01-06| 2019-03-29 13:15:27|
|1 |2019-01-06| 2019-06-01 01:42:57|
|1 |2019-01-06| 2019-01-06 17:01:...|
|5 |2019-06-16| 2019-07-19 11:50:34|
|5 |2019-06-16| 2019-07-13 19:49:39|
|5 |2019-06-16| 2019-08-27 17:37:22|
|2 |2018-07-30| 2019-01-01 07:03:...|
|2 |2018-07-30| 2019-07-30 01:27:57|
|2 |2018-07-30| 2019-02-01 00:08:35
I want to pickup the 1st txn_date after reg_date , i.e. the first txn_date of reg_date >= txn_date.
Expected output
id |reg_date | txn_date|
+----------+----------+--------------------+
|1 |2019-01-06| 2019-01-06 17:01:...|
|5 |2019-06-16| 2019-07-13 19:49:39|
|2 |2018-07-30| 2019-07-30 01:27:57|
I have done so far,
df = df.withColumn('txn_date',to_date(unix_timestamp(F.col('txn_date'),'yyyy-MM-dd HH:mm:ss').cast("timestamp")))
df = df.withColumn('reg_date',to_date(unix_timestamp(F.col('reg_date'),'yyyy-MM-dd').cast("timestamp")))
gg = df.groupBy('id','reg_date').agg(min(F.col('txn_date')))
But getting wrong results.
The condition reg_date >= txn_date can be ambiguous.
Does 2019-01-06>=2019-01-06 17:01:30 mean 2019-01-06 00:00:00>=2019-01-06 17:01:30 or 2019-01-06 23:59:59>=2019-01-06 17:01:30?
In your example, 2019-01-06>=2019-01-06 17:01:30 is evaluated to be true, so I assume it is the latter case, i.e. the case with 23:59:59.
Proceeding with the assumption above, here is how I coded it.
import pyspark.sql.functions as F
#create a sample data frame
data = [('2019-01-06','2019-02-15 12:51:15'),('2019-01-06','2019-03-29 13:15:27'),('2019-01-06','2019-01-06 17:01:30'),\
('2019-07-30','2019-07-30 07:03:01'),('2019-07-30','2019-07-30 01:27:57'),('2019-07-30','2019-07-30 00:08:35')]
cols = ('reg_date', 'txn_date')
df = spark.DataFrame(data,cols)
#add 23:59:59 to reg_date as a dummy_date for a timestamp comparison later
df = df.withColumn('dummy_date', F.concat(F.col('reg_date'), F.lit(' 23:59:59')))
#convert columns to the appropriate time data types
df = df.select([F.to_date(F.col('reg_date'),'yyyy-MM-dd').alias('reg_date'),\
F.to_timestamp(F.col('txn_date'),'yyyy-MM-dd HH:mm:ss').alias('txn_date'),\
F.to_timestamp(F.col('dummy_date'),'yyyy-MM-dd HH:mm:ss').alias('dummy_date')])
#implementation part
(df.orderBy('reg_date')
.filter(F.col('dummy_date')>=F.col('txn_date'))
.groupBy('reg_date')
.agg(F.first('txn_date').alias('txn_date'))
.show())
#+----------+----------------------+
#| reg_date| txn_date|
#+----------+----------------------+
#|2019-01-06| 2019-01-06 17:01:30|
#|2019-07-30| 2019-07-30 07:03:01|
#+----------+----------------------+
You don't need to order. You can discard all smaller values with a filter, then aggregate by id and get the smaller timestamp, because the first timestamp will be the minimum. Something like:
df.filter(df.reg_date >= df.txn_date) \
.groupBy(df.reg_date) \
.agg(F.min(df.txn_date)) \
.show()

Crystal reports Sum if a previous field is the same to current field on section 3(details)

Hi im just a newbie programmer and I have a problem in my crystal reports.
I have a table named "payroll" and has this fields(ID,FullName,NetSalary)
I have inserted 3 records on my mysql table:
ID|FullName|Netsalary
1 |Cris Tiu|500
2 |Mat Joe |100
3 |Mat Joe |400
how can I make it Look like this?
I dont want to Group them by Fullname and give a total but instead if the full name is duplicated, It will Display it once and total the Net salary.
ID|FullName|Netsalary
1 |Cris Tiu|500
2 |Mat Joe |500
I have tried adding a formula that contained this code:
if {Fulname}=previous({Fullname}) then
Sum({Netsalary})
else
{Netsalary}
but it gives me a display like this:
ID|FullName|Netsalary
1 |Cris Tiu|"Blank"
2 |Mat Joe |100
3 |Mat Joe |500
Please Help me my work depends on this. Thank You in Advance

Grouping of data

I have a database that records clients who have a rating score upon entry of the service we provide, this is between 0 - 50, they are seen on average once a week and after four sessions they are re-evaluated on the same score to see a trend so say initially they may score 22 and after four weeks it may be 44
What i am after is a sql query to group this data
+----+-------+--------+
|name|initial|followup|
+----+-------+--------+
|joe |22 | |
+----+-------+--------+
|joe | |44 |
+----+-------+--------+
i want this to show
+----+-------+--------+
|name|initial|followup|
+----+-------+--------+
|joe |22 |44 |
+----+-------+--------+
i know this is a simple question and have done this before but tis the time of the year and the pressure is on from management
many thanks in advance
Assuming the - means NULL, just use aggregation:
select name, max(initial) as initial, max(followup) as followup
from t
group by name;

SQL Query : Calculating cross distances based on Master detail predefined tables

I have a database with many tables, especially two tables one store paths and the other one store cities of a path :
Table Paths [ PathID, Name ]
Table Routes [ ID, PathID(Forein Key), City, GoTime, BackTime, GoDistance, BackDistance]
Table Paths :
---------------------------------------
|PathID |Name |
|-------+-----------------------------|
|1 |NewYork Casablanca Alpha 1 |
|7 |Paris Dubai 6007 10:00 |
---------------------------------------
Table Routes :
ID PathID City GoTime BackTime GoDistance BackDistance
1 1 NewYork 08:00 23:46 5810 NULL
2 1 Casablanca 15:43 16:03 NULL 5800
3 7 Paris 10:20 14:01 3215 NULL
4 7 Cairo 14:50 09:31 2425 3215
3 7 Dubai 18:21 06:00 NULL 2425
I want a Query that gives me all the possible combinations inside the same Path, something like :
PathID CityFrom CityTo Distance
I don't know if I made myself clear or not but hope you guys could help me, thanx in advance.
This is the good answer done manually !!
------------------------------------------------------
|PathID |Go_Back |CityA |CityB |Distance|
|-------+-----------+-----------+-----------+--------|
|1 |Go |NewYork |Casablanca |5810 |
|1 |Back |Casablanca |NewYork |5800 |
|7 |Go |Paris |Cairo |3215 |
|7 |Go |Paris |Dubai |5640 |
|7 |Go |Cairo |Dubai |2425 |
|7 |Back |Dubai |Cairo |2425 |
|7 |Back |Dubai |Paris |5640 |
|7 |Back |Cairo |Paris |3215 |
------------------------------------------------------
This comes down to two questions.
Q1:
How to split up column "Name" from table "Paths", so that it is in first normal form. See wikipedia for a definition. The domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. You must do this yourself. It might be cumbersome to use the text-processing functions of your database to split up the nonatomic column values.
Write a script (perl/python/... ) that does this, and re-import the results into a new table.
Q2:
HOw to calculate "possible paths combinations".
Maybe it is possible with a simple SQL query, by sorting the table. You haven't shown enough data.
Ultimately, this can be done with recursive SQL. Postgres can do this. It is an advanced topic.
You definitely must decide if your paths can contain loops. (A traveller might decide to take a circular detour many times, although it makes no sense practically. mathematically it is possible, though.)