Find average number of views before first lead event happens - sql

create table events (
fk_user integer,
event varchar(40),
time integer
);
Insert into events (fk_user, event, time)
VALUES
('1', 'view', '1'),
('1', 'view', '3'),
('1', 'view', '4'),
('1', 'lead', '5'),
('1', 'view', '6'),
('1', 'view', '7'),
('1', 'lead', '9'),
('2', 'view', '1'),
('2', 'lead', '2'),
('2', 'lead', '3'),
('2', 'view', '6'),
('2', 'view', '7'),
('2', 'view', '8'),
('5', 'view', '1'),
('5', 'view', '2'),
('2', 'view', '4'),
('2', 'lead', '5'),
('2', 'view', '9');
What I am trying to find is: There are 3 'views' before a 'lead' occurs from the top. I want to take the average of the 'time' of first three occurrences. Is it possible to do with the window function ?
Expected output should be: (1+3+4)/3 = 2.666 (If taken integer then 3)

You can use the min window function along with a case statement to find the first_lead_time per fkuser and then use a derived table to get the average value of the views rows that become before the first_lead_time
select fkuser, avg(time) from (
select * ,
min (case when event = 'lead' then time end) over (partition by fkuser) as first_lead_time
from events
) t where time < first_lead_time
and event = 'view'
group by fkuser
Another way
select e.fk_user, avg(e.time)
from events e
join (
select min(time) first_lead_time, fkuser
from events
where event = 'lead'
group by fkuser
) t on t.fkuser = e.fkuser
where e.time < t.first_lead_time
group by e.fkuser

Related

How can I join tables using information from different rows?

I have two similar tables that I would like to join. See reproducible example below.
WHAT NEEDS TO BE DONE
See comments in code: concatenating the values '2021-01-01'(column: Date), 'hat'(column: content), 'cat'(column: content) and 'A'(column: Tote) in first_table would lead to a unique key that can be joined with the exact same data in second_table. The result would be the first row of the 4 unique events (see desired_result: '#first tote'). In reality the rows would be a few million.
Reproducible example:
CREATE OR REPLACE TABLE
`first_table` (
`Date` string NOT NULL,
`TotearrivalTimestamp` string NOT NULL,
`Tote` string NOT NULL,
`content` string NOT NULL,
`location` string NOT NULL,
);
INSERT INTO `first_table` (`Date`, `TotearrivalTimestamp`, `Tote`, `content`, `location`) VALUES
('2021-01-01', '13:00','A','hat','1'), #first tote
('2021-01-01', '13:00','A','cat','1'), #first tote
('2021-01-01', '14:00', 'B', 'toy', '1'),
('2021-01-01', '14:00', 'B', 'cat', '1'),
('2021-01-01', '15:00', 'A', 'toy', '1'),
('2021-01-01', '13:00', 'A', 'toy', '1'),
('2021-01-02', '13:00', 'A', 'hat', '1'),
('2021-01-02', '13:00', 'A', 'cat', '1');
CREATE OR REPLACE TABLE
`second_table` (
`Date` string NOT NULL,
`ToteendingTimestamp` string NOT NULL,
`Tote` string NOT NULL,
`content` string NOT NULL,
`location` string NOT NULL,
);
INSERT INTO `second_table` (`Date`, `ToteendingTimestamp`, `Tote`, `content`, `location`) VALUES
('2021-01-01', '20:00', 'B', 'cat', '2'),
('2021-01-01', '19:00', 'A', 'cat', '1'), #first tote
('2021-01-01', '19:00', 'A', 'hat', '1'), #first tote
('2021-01-01', '20:00', 'B', 'toy', '2'),
('2021-01-01', '14:00', 'A', 'toy', '1'),
('2021-01-02', '14:00', 'A', 'hat', '1'),
('2021-01-02', '14:00', 'A', 'cat', '1'),
('2021-01-01', '16:00', 'A', 'toy', '1');
CREATE OR REPLACE TABLE
`desired_result` (
`Date` string NOT NULL,
`Tote` string NOT NULL,
`TotearrivalTimestamp` string NOT NULL,
`ToteendingTimestamp` string NOT NULL,
`location_first_table` string NOT NULL,
`location_second_table` string NOT NULL,
);
INSERT INTO `desired_result` (`Date`, `Tote`, `TotearrivalTimestamp`, `ToteendingTimestamp`, `location_first_table`, `location_second_table`) VALUES
('2021-01-01', 'A', '13:00', '19:00', '1', '1'), #first tote
('2021-01-01', 'B', '14:00', '20:00', '1', '1'),
('2021-01-01', 'A', '15:00', '16:00', '1', '2'),
('2021-01-02', 'A', '13:00', '14:00', '1', '1');
#### this does not give what I want####
select first.date as Date, first.tote, first.totearrivaltimestamp, second.toteendingtimestamp, first.location as location_first_table, second.location as location_second_table
from `first_table` first
inner join `second_table` second
on first.tote = second.tote
and first.content = second.content;
I was able to reproduce the'desired_result' table (mostly) with the SQL below. I believe there exists a few typos with the 'insert into' statements. However, I think this meets the intent.
Query:
select
first_table.date as Date,
first_table.tote,
first_table.totearrivaltimestamp,
second_table.toteendingtimestamp,
first_table.location as location_first_table,
second_table.location as location_second_table
from first_table
inner join `second_table`
on first_table.Date = second_table.Date
and first_table.tote = second_table.tote
group by first_table.Date, first_table.TotearrivalTimestamp, first_table.tote;
result:
2021-01-01|A|13:00|19:00|1|1
2021-01-01|B|14:00|20:00|1|2
2021-01-01|A|15:00|19:00|1|1
2021-01-02|A|13:00|14:00|1|1
This result assumes your first table dates will always match for totes/timestamps. The group by function then merges duplicate results. The second table information matches the date and tote of the first table and is appended to the line item.
This answer should work. I think your issue might be with some of your quoting of tables....
select f.'date'
,f.tote
, f.totearrivaltimestamp
, s.toteendingtimestamp
, f.location as location_first_table
, s.location as location_second_table
from first f
,INNER JOIN "second" s on f.'date' = s.'date'
and f.tote = s.tote
and f.content = s.content

SQL relationships for unique sets of rows

I am trying to set up a relationship between a couple of tables where a unique set of rows in one table relate to a row in another table.
I have came up with a scenario to reflect what I am trying to accomplish.
In this scenario, we are trying to determine the role(s) that a new hire should be given, based on the set of skills that they posses. An employee can be given multiple roles. For example, a software engineer with management experience is given both the Software Engineer and the Tech Lead roles. However, the roles given must line up exactly with a given skill set. If a new hire comes in with every skill we are looking for, we give them the CTO role. The CTO posses all of the skills for both the Software Engineer and Tech Lead roles, but they are not given those roles.
I believe my issue boils down to the skill_set relationship, where I am trying to tie a unique set of rows from the skill table to a specific skill_set. Any given skill can be in many skill_sets, but when querying for a skill_set, I only want to return the skill_set that contains all of the skills, but currently I don't know of a good way to query for that specific skill_set
We don't need to worry about trying to find roles for lists of skills that aren't valid skill_sets. Those can return no role.
Note: This schema is not set in stone. Changing it is definitely an option, so if I have modeled this incorrectly, we can fix that.
CREATE TABLE IF NOT EXISTS `skill` (
`id` int(6) unsigned NOT NULL,
`name` varchar(16) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `skill_set` (
`id` int(6) unsigned NOT NULL,
`skill_id` int(6) unsigned NOT NULL,
PRIMARY KEY (`id`, `skill_id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `default_role` (
`skill_set_id` int(6) unsigned NOT NULL,
`role_id` int(6) unsigned NOT NULL,
PRIMARY KEY (`skill_set_id`, `role_id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `role` (
`id` int(6) unsigned NOT NULL,
`name` varchar(32) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `skill` (`id`, `name`) VALUES
('1', 'python'),
('2', 'javascript'),
('3', 'ec2'),
('4', 'docker'),
('5', 'management');
INSERT INTO `skill_set` (`id`, `skill_id`) VALUES
('1', '1'),
('2', '2'),
('3', '1'),
('3', '2'),
('4', '3'),
('5', '4'),
('6', '1'),
('6', '2'),
('6', '5'),
('7', '3'),
('7', '4'),
('7', '5'),
('8', '1'),
('8', '2'),
('8', '3'),
('8', '4'),
('8', '5');
INSERT INTO `default_role` (`skill_set_id`, `role_id`) VALUES
('1', '1'),
('2', '1'),
('3', '2'),
('4', '3'),
('5', '3'),
('6', '2'),
('6', '4'),
('7', '3'),
('7', '4'),
('8', '5');
INSERT INTO `role` (`id`, `name`) VALUES
('1', 'Junior Software Engineer'),
('2', 'Software Engineer'),
('3', 'DevOps Engineer'),
('4', 'Tech Lead'),
('5', 'CTO');
A SQL fiddle is also available: http://sqlfiddle.com/#!9/86bcfe0
Some example outputs:
Given the skills: ['python']
Return the default role: Junior Software Engineer
Given the skills: ['python', 'javascript']
Return the default role: Software Engineer
Given the skills: ['ec2']
Return the default role: DevOps Engineer
Given the skills: ['python', 'javascript', 'management']
Return the default roles: Software Engineer, Tech Lead
Given the skills: ['python', 'javascript', 'ec2', 'docker', 'management']
Return the default role: CTO

How to find how many customer buy more than one stuff from a table

Table sales:
create table sales (
Date date,
customer_id integer,
product_id integer,
units_sold integer,
paid_amount integer
);
Insert into sales (Date, customer_id, product_id, units_sold, paid_amount)
VALUES
('2016-01-01', '1', '1', '5', '45'),
('2016-01-01', '2', '1', '2', '18'),
('2016-01-01', '3', '2', '7', '35'),
('2016-01-07', '1', '3', '3', '45'),
('2016-01-07', '2', '2', '5', '25'),
('2016-01-07', '4', '2', '5', '25'),
('2016-01-10', '1', '4', '5', '30'),
('2016-01-10', '2', '4', '5', '30'),
('2016-01-10', '4', '5', '6', '60'),
('2016-01-10', '4', '3', '9', '135'),
('2016-01-14', '3', '1', '4', '60'),
('2016-01-14', '2', '3', '6', '90'),
('2016-01-14', '2', '3', '6', '90');
How many customers bought more than one different product on every visit (i.e. day)?
You need to group by and get the count() like
select customer_id, count(distinct product_id) as item_purchased
from sales
group by "date", customer_id
having count(distinct product_id) > 1;

Insert and Update multiple values in single SQL Statement

We are using Insert statements for multi inserts like this:
INSERT INTO [db1].[dbo].[tb1] ([ID], [CLM1], [CLM2])
VALUES
('1', "A", "DB"),
('2', "AB", "BQ"),
('3', "AA", "BH"),
('4', "AD", "BT"),
('5', "AF", "EB"),
('6', "EA", "AB")
In the above table, ID is primary key, want to know one query with passing all values, values should update existing records and insert new records into table
You can use Merge:
MERGE INTO [db1].[dbo].[tb1] AS Target
USING (
VALUES
('1', 'A', 'DB'),
('2', 'AB', 'BQ'),
('3', 'AA', 'BH'),
('4', 'AD', 'BT'),
('5', 'AF', 'EB'),
('6', 'EA', 'AB')
) AS Source (new_ID, new_CLM1, new_CLM2)
ON Target.ID = Source.new_ID
WHEN MATCHED THEN
UPDATE SET
ID = Source.new_ID,
CLM1 = Source.new_CLM1,
CLM2 = Source.new_CLM2
WHEN NOT MATCHED BY TARGET THEN
INSERT (ID, CLM1, CLM2) VALUES (new_ID, new_CLM1, new_CLM2);
Merge Doc

Adding multiple rows in SQL

I am asked to add 8 rows into a table.
insert into Rating ( rID, mID, stars, ratingDate )
values ('207', '101', '5', null), ('207', '102', '5', null),
('207', '103', '5', null), ('207', '104', '5', null),
('207', '105', '5', null), ('207', '106', '5', null),
('207', '107', '5', null), ('207', '108', '5', null)
This operation works good with one value added but when adding multiple gives the error
Query failed to execute: near ",": syntax error
What is missing?
A late answer
If your are using SQLITE version 3.7.11 or above, then multiple rows insert is possible by this syntax,
SIMPLEST WAY
INSERT INTO Rating (rID, mID, stars, ratingDate) VALUES ('207', '102', '5', null) , ('207', '102', '5', null) , ('207', '102', '5', null)
The above clause posted in question do work if the new SQLITE version is used.
SELECT CLAUSE
insert into Rating
SELECT '207' AS rID, '101' AS mID, '5' AS stars, null AS ratingDate
UNION SELECT '207', '102', '5', null
UNION SELECT '207', '103', '5', null
UNION SELECT '207', '104', '5', null
UNION SELECT '207', '105', '5', null
UNION SELECT '207', '106', '5', null
UNION SELECT '207', '107', '5', null
UNION SELECT '207', '108', '5', null
or SQL is
insert into Rating (rID, mID, stars, ratingDate)
SELECT '207', '101', '5', null
UNION SELECT '207', '102', '5', null
UNION SELECT '207', '103', '5', null
UNION SELECT '207', '104', '5', null
UNION SELECT '207', '105', '5', null
UNION SELECT '207', '106', '5', null
UNION SELECT '207', '107', '5', null
UNION SELECT '207', '108', '5', null
REMEMBER I you do not want to check for duplicate in above set of inserted values then use UNION ALL in place of UNION as it will be little faster.
I assume your RDBMS don't support such construction.
insert into Rating ( rID, mID, stars, ratingDate )
values ('207', '101', '5', null);
insert into Rating ( rID, mID, stars, ratingDate )
values ('207', '102', '5', null);
.....
I sugest:
insert into Rating ( rID, mID, stars, ratingDate ) values ('207', '101', '5', null);
insert into Rating ( rID, mID, stars, ratingDate ) values ('207', '102', '5', null);
...
insert into Rating ( rID, mID, stars, ratingDate ) values ('207', '108', '5', null);
i created table in sql lite . table creation script is as follows
create table Rating (rID varchar(10),mID varchar(10),stars varchar(10),ratingDate date);
And i used following query to insert into above table and its working fine for me.
insert into Rating ( rID, mID, stars, ratingDate )
values ('207', '101', '5', null), ('207', '102', '5', null),
('207', '103', '5', null), ('207', '104', '5', null),
('207', '105', '5', null), ('207', '106', '5', null),
('207', '107', '5', null), ('207', '108', '5', null);