How to count ids across two distinct columns in SQL - sql

I have a table that contains two different columns called: a_id and b_id.
I want to process these two columns and count the distinct number of job ids in each column and report that. Ultimately, I want to determine how many total jobs were created on any given day and how many jobs were completed on any given day.
I started writing up a rough query to attempt to solve this issue but realized that the select distinct is more complicated than I thought. I also am producing one column: called total_jobs and I think there should be two columns: total jobs created and total jobs completed based on job_created_date and job_completed_date. Kind of lost here.
SELECT
job_created_date,
job_completed_date,
category,
COUNT(
SELECT
DISTINCT a_id, b_id
FROM my_data_table
) AS total_jobs
FROM my_data_table
WHERE
ds BETWEEN '<DATEID-8>' AND '<DATEID-1>'
GROUP BY
1, 2, 3, 4
I want the output to help me create a bar graph with dates on the x-axis and stacked bars representing # of jobs created on that day and # of jobs remaining to be completed on that day.

I don't' know what your data looks like, but I can speculate that you have one row per job and job_completed_date is non-NULL for completed jobs.
If so, you can do:
SELECT t.*,
COUNT(*) OVER () as total_jobs,
COUNT(job_complete) as total_completed_jobs
FROM my_data_table t
WHERE t.ds BETWEEN '<DATEID-8>' AND '<DATEID-1>';

Related

(Hive) SQL retrieving data from a column that has 1 to N relationship in another column

How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1

How to get rolling unique count of employees per year based on key fields

I have the following table and I wanted to get the running unique count by dept, team, level. But the cumulative unique count will restart per year.
Note: sorry in my main table example, employee numbers may repeat up to four times. There is another column called leave type but wasn't able to illustrate it in the image
Main table
Expected output would be something like below.
Expected output
Is this possible? Apologies. Not too advanced when it comes to SQL. Thank you.
You can do:
select max(extract(year from date)), date, department, level, team, count(*)
from t
group by date, department, level, team

Select latest and earliest times within a time group and a pivot statement

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?
First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Randomize Return Results - Access

I need to match up an employee with a task in a small Microsoft Access DB I built. Essentially, I have a list of 45 potential tasks, and I have 25 employees. What I need is:
Each employee to have at LEAST one task
No employee to have more than TWO
Be able to randomize the results every time I run the query (so the same people don't get consistently the same tasks)
My table structure is:
Employees - w/ fields: ID, Name
Tasks - w/ fields: ID, Location, Task Group, Task
I know this is a dumb question, but I truly am struggling. I have searched through SO and Google for help but have been unsuccessful.
I don't have a way to link together employees to tasks since each employee is capable of every task, so I was going to:
1. SELECT * from Employees
2. SELECT * from Tasks
3. Union
4. COUNT(Name) <= 2
But I don't know how to randomize those results so that folks are randomly matched up, with each person at least once and nobody more than twice.
Any help or guidance is appreciated. Thank you.
Consider a cross join with an aggregate query that randomizes the choice set. Currently, at 45 X 25 this yields a cartesian product of 1,125 records which is manageable.
Select query (save as a query object, assumes Tasks has autonumber field)
SELECT cj.[Emp_Name], Max(cj.ID) As M_ID, Max(cj.Task) As M_Task
FROM
(SELECT e.[Emp_Name], t.ID, t.Task
FROM Employees e,
Tasks t) cj
GROUP BY cj.[Emp_Name], Rnd(cj.ID)
ORDER BY cj.[Emp_Name], Rnd(cj.ID)
However, the challenge here is this above query randomizes the order of all 45 tasks per each of the 25 employees whereas you need the top two tasks per employee. Unfortunately, MS Access does not have a row id like other DBMS to use to select top 2 per employee. And we cannot use a correlated subquery on Task ID per Employee since this will always return the highest two task IDs by their value and not random top two IDs.
Therefore to do so in Access, you will need a temp table regularly cleaned out prior to each allocation of employee tasks and use autonumber for selection via correlated subquery.
Create table (run once, autonumber field required)
CREATE TABLE CrossJoinRandomPicks (
ID AUTOINCREMENT PRIMARY KEY,
Emp_Name TEXT(255),
M_ID LONG,
M_Task TEXT(255)
)
Delete query (run regularly)
DELETE FROM CrossJoinRandomPicks;
Append query (run regularly)
INSERT INTO CrossJoinRandomPicks ([Emp_Name], [M_ID], [M_Task])
SELECT [Emp_Name], [M_ID], [M_Task]
FROM mySavedCrossJoinQuery;
Final query (selects top two random tasks for each employee)
SELECT c.name, c.M_Letter
FROM CrossJoinRandomPicks c
WHERE
(SELECT Count(*) FROM CrossJoinRandomPicks sub
WHERE sub.name = c.name
AND sub.ID <= c.ID) <= 2;

Select Last Updated Row with condition

I'm working on building a workload tracking system, I have a table that currently has listed all the tasks to be completed (each with a unique ID), but also has all the updates with a datestamp so that I can track how long it took for the status to be updated.
My dilemma is that for a form I want to query only the latest update, currently the select query shows both the original task and the updated task separately.
In words, I guess what I need to do is to select only a task given that the ID is the last one with that same task number (which is different than the ID, there will be duplicates when it is updated)
So if I have:
ID Task Date
1 A 4/30/13
2 B 5/2/13
3 A 5/3/13
That the table only shows:
ID Task Date
3 A 5/3/13
2 B 5/2/13
How can I do this? I think I'm missing something simple...
There are multiple ways to approach this query, even in Access. Here is a way using in with a subquery:
select t.*
from t
where t.id in (select MAX(id) as maxid
from t
group by task
)
order by task
The subquery finds the maximum ids for all the tasks. It then returns the rows from the original table that match those ids.