Select random record within groups sqlite - sql

I can't seem to get my head around this. I have a single table in SQlite, from which I need to select a random() record for EACH group. So, considering a table such as:
id link chunk
2 a me1
3 b me1
4 c me1
5 d you2
6 e you2
7 f you2
I need sql that will return a random link value for each chunk. So one time I run it would give:
me1 | a
you2 | f
the next time maybe
me1 | c
you2 | d
I know similar questions have been answered but I'm not finding a derivation of one that applies here.
UPDATE:
Nuts, follow up question: so now I need to EXCLUDE rows where a new field "qcinfo" is set to 'Y'.
This, of course, hides rows whenever the random ID hits one where qcinfo = 'Y', which is wrong. I need to exclude the row from being considered in the chunk, but still generate a random record for the chunk if any records have qcinfo <> 'Y'.
select t.chunk ,t.id, t.qcinfo, t.link from table1
inner join
(
select chunk ,cast(min(id)+abs(random() % (max(id)-min(id)))as int) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id
where qcinfo <> 'Y'

A bit hackish, but it works... See sql fiddle http://sqlfiddle.com/#!2/81e75/7
select t.chunk
,t.link
from table1 t
inner join
(
select chunk
,FLOOR(min(id) + RAND() * (max(id)-min(id))) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id
Sorry, I thought that you said MySQL.
Here is the fiddle and the code for SQLite
http://sqlfiddle.com/#!5/81e75/12
select t.chunk
,t.link
from table1 t
inner join
(
select chunk
,cast(min(id)+abs(random() % (max(id)-min(id)))as int) AS random_id
from table1
group by chunk
) sq
on t.chunk = sq.chunk
and t.id = sq.random_id

Note that SQLite returns the first value corresponding to a group when we do a group by without an aggregation
select link, chunk from table group by chunk;
Running that you'll get this
me1 | a
you2| d
Now you can make the first value random by randomly sorting the table and then grouping. Here's the final solution.
select link, chunk from (select * from table order by random()) group by chunk;

Related

Don't select rows where column A is duplicated AND any row of column B is a specific value

I'm working on generating a report merging multiple tables. The report requires only showing projects that did not have any document marked 'Not Received' These document markings are listed in a table that lists each document in an individual line. So when merged into my other table it creates multiple rows of the same project. For example the following table
Project Number
ChecklistValue
565
Received
565
Not Received
465
Received
465
Not Applicable
As you can see really only two projects are listed on this table but the desired output is:
Project Number
Other Info
465
etc
I do not need the checklist value on the actual report, so I can use the GROUP BY to combine all the good rows, but where I have an Issue is that would still include project 565 even if I include something like where ChecklistValue <> 'Not Received', 565 needs to be hidden from the report entirely because any row for 565 contains 'Not Received'.
So that's my actual question, how do I exclude all project numbers rows that have any row containing 'Not Received'?
I'm adding the entire query will generalized names below:
SELECT
Project Number
,Name
,Contractor
,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod
,S.NoteDate
,S.FinalAppDate
,Status
,S.ONE
,S.TWO
,S.THREE
,S.FOUR
,CH.ChecklistValue
FROM [DB1] A
INNER JOIN [DB2] C ON A.Contractor = C.Contractor
INNER JOIN [DB3] S ON A.AppID = S.AppID
INNER JOIN [DB4] LS ON S.StatusID = LS.StatusID
LEFT OUTER JOIN [DB5] CH ON A.AppID = CH.AppID AND CH.OtherID = 1
WHERE C.TypeID = 4 AND A.YEAR = 2022, AND S.THING = 1 AND
(CH.CheckListValue IS NULL OR A.AppID NOT IN (SELECT * FROM [DB5] WHERE
CheckListValue = 'Not Reveived'))
GROUP BY Project Number,Name,Contractor,ABS(DATEDIFF(day,(ActualDate),(EstDate))) AS DelayPeriod,S.NoteDate,S.FinalAppDate,Status,S.ONE,S.TWO,S.THREE,S.FOUR
The last portion of the WHERE clause was added from a suggestion, but I'm clearly not implementing it correctly as it errors
You can use not in like:
create table test(
num int,
description varchar(20)
);
insert into test(num,description)
values(565,'Received'),
(565,'Not Received'),
(465,'Received'),
(465,'Not Applicable');
select *
from test
where num not in
(
select num -- Only select one column here
from test
where description = 'Not Received'
);
Results:
+-----+---------------+
| num | description |
+-----+---------------+
| 465 | Received |
| 465 | Not Applicable|
+-----+---------------+
db<>fiddle this is on sql-server but works on other dbms as well.
So in your query you should have (in my understanding):
OR A.AppID NOT IN
(
SELECT AppID -- Not select *
FROM [DB5]
WHERE CheckListValue = 'Not Reveived'
)
Other way to do it is with a cte but it is complicated at first glance:
with x as(
select num
from test
where description = 'Not Received'
)
select t.num, t.description
from test t
left join x
on t.num = x.num
where x.num is null
I'm first creating a cte on the num column where the description = not received then I'm selecting all from the test table, and I'm left joining to the cte but I'm only selecting the num column that are not in the cte by using where x.num is null, and this will only return 465.
Now which one is better? I don't know sometimes join would be faster and sometimes in, for more you can find on this post.

Optimizing SQL Cross Join that checks if any array value in other column

Let's say I have a table events with structure:
id
value_array
XXXX
[a,b,c,d]
...
...
I have a second table values_of_interest with structure:
value
x
y
z
a
I want to find id's that have any of the values found in values_of_interest. All else equal, what would be the most performant SQL to make this happen? (I am using BigQuery, but feel free to answer more generally)
My current thought is:
SELECT
DISTINCT e.id
FROM
events e, values_of_interest vi
WHERE
EXISTS(
SELECT
value
FROM
UNNEST(e.value_array) value
JOIN
vi ON vi.value = e.value
)
Few quick options for BigQuery Standard SQL
Option 1
select id
from `project.dataset.events`
where exists (
select 1
from `project.dataset.values_of_interest`
where value in unnest(value_array)
)
Option 2
select id
from `project.dataset.events` t
where (
select count(1)
from t.value_array as value
join `project.dataset.values_of_interest`
using(value)
) > 0
I would write this using exists and a join:
select e.id
from `project.dataset.events` e
where exists (select 1
from unnest(e.value_array) val join
`project.dataset.values_of_interest` voi
on val = voi.value
);

Returning a number when result set is null

Each lot object contains a corresponding list of work orders. These work orders have tasks assigned to them which are structured by the task set on the lots parent (the phase). I am trying to get the LOT_ID back and a count of TASK_ID where the TASK_ID is found to exist for the where condition.
The problem is if the TASK_ID is not found, the result set is null and the LOT_ID is not returned at all.
I have uploaded a single row for LOT, PHASE, and WORK_ORDER to the following SQLFiddle. I would have added more data but there is a fun limiter .. err I mean character limiter to the editor.
SQLFiddle
SELECT W.[LOT_ID], COUNT(*) AS NUMBER_TASKS_FOUND
FROM [PHASE] P
JOIN [LOT] L ON L.[PHASE_ID] = P.[PHASE_ID]
JOIN [WORK_ORDER] W ON W.[LOT_ID] = L.[LOT_ID]
WHERE P.[TASK_SET_ID] = 1 AND W.[TASK_ID] = 41
GROUP BY W.[LOT_ID]
The query returns the expected result when the task id is found (46) but no result when the task id is not found (say 41). I'd expect in that case to see something like:
+--------+--------------------+
| LOT_ID | NUMBER_TASKS_FOUND |
+--------+--------------------+
| 500 | 0 |
| 506 | 0 |
+--------+--------------------+
I have a feeling this needs to be wrapped in a sub-query and then joined but I am uncertain what the syntax would be here.
My true objective is to be able to pass a list of TASK_ID and get back any LOT_ID that doesn't match, but for now I am just doing a query per task until I can figure that out.
You want to see all lots with their counts for the task. So either outer join the tasks or cross apply their count or use a subquery in the select clause.
select l.lot_id, count(wo.work_order_id) as number_tasks_found
from lot l
left join work_order wo on wo.lot_id = l.lot_id and wo.task_id = 41
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
group by l.lot_id
order by l.lot_id;
or
select l.lot_id, w.number_tasks_found
from lot l
cross apply
(
select count(*) as number_tasks_found
from work_order wo
where wo.lot_id = l.lot_id
and wo.task_id = 41
) w
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
order by l.lot_id;
or
select l.lot_id,
(
select count(*)
from work_order wo
where wo.lot_id = l.lot_id
and wo.task_id = 41
) as number_tasks_found
from lot l
where l.phase_id in (select p.phase_id from phase p where p.task_set_id = 1)
order by l.lot_id;
Another option would be to outer join the count and use COALESCE to turn null into zero in your result.

SQL Update Skipping duplicates

Table 1 looks like the following.
ID SIZE TYPE SERIAL
1 4 W-meter1 123456
2 5 W-meter2 123456
3 4 W-meter 585858
4 4 W-Meter 398574
As you can see. Items 1 and 2 both have the same Serial Number. I have an innerjoin update statement that will update the UniqueID on these devices based on linking their serial number to the list.
What I would like to do. Is modify by hand the items with duplicate serial numbers and scripted update the ones that are unique. Im presuming I have to reference the distinct command here somewhere buy not sure.
This is my update statement as is. Pretty simple and straight forward.
update UM00400
Set um00400.umEquipmentID = tb2.MIUNo
from UM00400 tb1
inner join AA_Meters tb2 on
tb1.umSerialNumber = tb2.Old_Serial_Num
where tb1.umSerialNumber <> tb2.New_Serial_Num
;WITH CTE
AS
(
SELECT * , rn = ROW_NUMBER() OVER (PARTITION BY SERIAL ORDER BY SERIAL)
FROM UM00400
)
UPDATE CTE
SET CTE.umEquipmentID = tb2.MIUNo
inner join AA_Meters tb2
on CTE.umSerialNumber = tb2.Old_Serial_Num
where tb1.umSerialNumber <> tb2.New_Serial_Num
AND CTE.rn = 1
This will update the 1st record of multiple records with the same SERIAL.
If i understand your question correctly below query will help you out :
;WITH CTE AS
(
// getting those serial numbers which are not duplicated
SELECT umSerialNumber,COUNT(umSerialNumber) as CountOfSerialNumber
FROM UM00400
GROUP BY umSerialNumber
HAVING COUNT(umSerialNumber) = 1
)
UPDATE A SET A.umEquipmentID = C.MIUNo
FROM UM00400 A
INNER JOIN CTE B ON A.umSerialNumber = B.umSerialNumber
INNER JOIN AA_Meters C ON A.umSerialNumber = C.Old_Serial_Num

Tricky SQLite query, could use some assistance

I have a rather confusing SQLite query that I can't seem to quite wrap my brain around.
I have the following four tables:
Table "S"
sID (string/guid) | sNum (integer)
-----------------------------------
aaa-aaa 1
bbb-bbb 2
ccc-ccc 3
ddd-ddd 4
eee-eee 5
fff-fff 6
ggg-ggg 7
Table "T"
tID (string/guid) | ... other stuff
-----------------------------------
000
www
xxx
yyy
zzz
Table "S2TMap"
sID | tID
-------------------
aaa-aaa 000
bbb-bbb 000
ccc-ccc xxx
ddd-ddd yyy
eee-eee www
fff-fff 000
ggg-ggg 000
Table "temp"
oldID (string/guid) | newID (string/guid)
------------------------------------------
dont care fff-fff
dont care ggg-ggg
dont care zzz
What I need is to be able to get the MAX() sNum that exists in a specified "t" if the sID doesn't exist in the temp.NewID table.
For example, given the T '000', '000' has S 'aaa-aaa', 'bbb-bbb', 'fff-fff', and 'ggg-ggg' mapped to it. However, both 'fff-fff' and 'ggg-ggg' exist in the TEMP table, which means I need to only look at 'aaa-aaa' and 'bbb-bbb'. Thus, the statement would return "2".
How would I go about doing this?
I was thinking something along the lines of the following for selecting s that don't exist in the "temp" table, but I'm not sure how to get the max of the seat and only do it based on a specific 't'
SELECT s.sID, s.sNum FROM s WHERE NOT EXISTS ( SELECT newID from temp where tmp.newID = s.sID)
Thanks!
Give this a try:
select max(s.sNum) result from s2tmap st
join s on st.sId = s.sId
where st.tId = '000' and not exists (
select * from temp
where temp.newId = st.sId)
Here is the fiddle to play with.
Another option, probably less efficient would be:
select max(s.sNum) result from s2tmap st
join s on st.sId = s.sId
where st.tId = '000' and st.sId not in (
select newId from temp)
The following query should give you a list of Ts and their max sNums (as long as all exist in S and S2TMap):
SELECT t.tID, MAX(sNum)
FROM S s
JOIN S2TMap map on s.sID=map.sID
JOIN T t on map.tId=t.tID
LEFT JOIN temp tmp on s.sID=tmp.newID
WHERE tmp.newID IS NULL
You were close, you just had to join on S2TMap and then to T in order to restrict the result set to a given T.
SELECT MAX(s.sNum)
FROM s
INNER JOIN S2TMap m on m.sID = s.sID
INNER JOIN t on t.tID = m.tID
WHERE t.tID = '000'
AND NOT EXISTS (
SELECT newID FROM temp WHERE temp.newID = s.sID
)