What are segments in Lucene?

What are segments in Lucene? - lucene

What are segments in Lucene?
What are the benefits of segments?

The Lucene index is split into smaller chunks called segments. Each segment is its own index. Lucene searches all of them in sequence.
A new segment is created when a new writer is opened and when a writer commits or is closed.
The advantages of using this system are that you never have to modify the files of a segment once it is created. When you are adding new documents in your index, they are added to the next segment. Previous segments are never modified.
Deleting a document is done by simply indicating in a file which document of a segment is deleted, but physically, the document always stays in the segment. Documents in Lucene aren't really updated. What happens is that the previous version of the document is marked as deleted in its original segment and the new version of the document is added to the current segment. This minimizes the chances of corrupting an index by constantly having to modify its content when there are changes. It also allows for easy backup and synchronization of the index across different machines.
However, at some point, Lucene may decide to merge some segments. This operation can also be triggered with an optimize.

A segment is very simply a section of the index. The idea is that you can add documents to the index that's currently being served by creating a new segment with only new documents in it. This way, you don't have to go to the expensive trouble of rebuilding your entire index frequently in order to add new documents to the index.

The segment benefits have been answered already by others. I will include an ascii diagram of a Lucene Index.
Lucene Segment
A Lucene segment is part of an Index. Each segment is composed of several index files. If you look inside any of these files, you will see that it holds 1 or more Lucene documents.
+- Index 5 ------------------------------------------+
| |
| +- Segment _0 ---------------------------------+ |
| | | |
| | +- file 1 -------------------------------+ | |
| | | | | |
| | | +- L.Doc1-+ +- L.Doc2-+ +- L.Doc3-+ | | |
| | | | | | | | | | | |
| | | | field 1 | | field 1 | | field 1 | | | |
| | | | field 2 | | field 2 | | field 2 | | | |
| | | | field 3 | | field 3 | | field 3 | | | |
| | | | | | | | | | | |
| | | +---------+ +---------+ +---------+ | | |
| | | | | |
| | +----------------------------------------+ | |
| | | |
| | | |
| | +- file 2 -------------------------------+ | |
| | | | | |
| | | +- L.Doc4-+ +- L.Doc5-+ +- L.Doc6-+ | | |
| | | | | | | | | | | |
| | | | field 1 | | field 1 | | field 1 | | | |
| | | | field 2 | | field 2 | | field 2 | | | |
| | | | field 3 | | field 3 | | field 3 | | | |
| | | | | | | | | | | |
| | | +---------+ +---------+ +---------+ | | |
| | | | | |
| | +----------------------------------------+ | |
| | | |
| +----------------------------------------------+ |
| |
| +- Segment _1 (optional) ----------------------+ |
| | | |
| +----------------------------------------------+ |
+----------------------------------------------------+
Reference
Lucene in Action Second Edition - July 2010 - Manning Publication

Related

How to get the number of empty cells for each column in spark dataframe

I'd like to get the number of each column's empty value, so I tried
ele_df.where(ele_df['Shipment_ID'].isNotNull()).select('Shipment_ID').show()
But it returns me the empty value, it seems it consider the empty value as a non-null value.
+------------------+
|Shipment_ID|
+------------------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+------------------+
Could you guys help me with this?

Speeding up WHERE EXISTS clause by FETCH NEXT clause

I have a long running Oracle Query which uses a bunch of:
WHERE EXISTS (SELECT NULL FROM Table WHERE TableColumn IN (...))
Instead of using SELECT NULL, which goes through the entire table to find criteria, can't I just put FETCH NEXT 1 ROW ONLY after it since I only care if TableColumn is IN (...)?
Like this:
WHERE EXISTS (SELECT NULL FROM Table WHERE TableColumn IN (...) FETCH NEXT 1 ROW ONLY)
So the WHERE EXISTS would be evaluated quicker.
EDIT:
Below is the query plan without the FETCH NEXT clause attached:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 75 | 521611 | |
| 1 | SORT AGGREGATE | | 1 | 75 | | |
| 2 | HASH JOIN | | 531266 | 39844950 | 521611 | |
| 3 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 4 | HASH JOIN | | 531224 | 33998336 | 521185 | |
| 5 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 2 | |
| 6 | HASH JOIN | | 531224 | 31342216 | 521177 | |
| 7 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 8 | HASH JOIN SEMI | | 531224 | 28686096 | 521169 | |
| 9 | NESTED LOOPS SEMI | | 531224 | 28686096 | 521169 | |
| 10 | STATISTICS COLLECTOR | | | | | |
| 11 | HASH JOIN RIGHT SEMI | | 531224 | 25498752 | 112887 | |
| 12 | TABLE ACCESS FULL | AMSACTVGRPEMPL | 2364 | 35460 | 10 | |
| 13 | TABLE ACCESS FULL | ACTV | 12779986 | 421739538 | 112712 | |
| 14 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 163091724 | 978550344 | 251246 | |
| 15 | INDEX FAST FULL SCAN | ACTVSUBACTV_DX2 | 163091724 | 978550344 | 251246 | |
------------------------------------------------------------------------------------------------
Below is the query plan with the FETCH NEXT clause attached:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 113148 | |
| 1 | SORT AGGREGATE | | 1 | 69 | | |
| 2 | FILTER | | | | | |
| 3 | HASH JOIN | | 531221 | 36654249 | 113144 | |
| 4 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 5 | HASH JOIN | | 531179 | 30808382 | 112718 | |
| 6 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 2 | |
| 7 | HASH JOIN | | 531179 | 28152487 | 112710 | |
| 8 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 9 | HASH JOIN RIGHT SEMI | | 531179 | 25496592 | 112702 | |
| 10 | TABLE ACCESS FULL | AMSACTVGRPEMPL | 2167 | 32505 | 10 | |
| 11 | TABLE ACCESS FULL | ACTV | 12778893 | 421703469 | 112527 | |
| 12 | VIEW | | 1 | 13 | 4 | |
| 13 | WINDOW BUFFER PUSHED RANK | | 8 | 48 | 4 | |
| 14 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 8 | 48 | 4 | |
------------------------------------------------------------------------------------------------
From what I see, it looks like without the FETCH NEXT it's adding overhead by more TABLE ACCESS FULL
EDIT #2
Adding AND ROWNUM = 1 instead of FETCH NEXT 1 ROW ONLY:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54 | 128114 | |
| 1 | SORT AGGREGATE | | 1 | 54 | | |
| 2 | FILTER | | | | | |
| 3 | HASH JOIN | | 12779902 | 690114708 | 113296 | |
| 4 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 5 | HASH JOIN | | 12778893 | 549492399 | 112713 | |
| 6 | MERGE JOIN CARTESIAN | | 30418 | 304180 | 31 | |
| 7 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 8 | BUFFER SORT | | 454 | 2270 | 30 | |
| 9 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 0 | |
| 10 | TABLE ACCESS FULL | ACTV | 12778893 | 421703469 | 112517 | |
| 11 | COUNT STOPKEY | | | | | |
| 12 | INLIST ITERATOR | | | | | |
| 13 | INDEX UNIQUE SCAN | PK_AMSACTVGRPEMPL | 1 | 15 | 2 | |
| 14 | COUNT STOPKEY | | | | | |
| 15 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 2 | 12 | 4 | |
------------------------------------------------------------------------------------------------

The FETCH NEXT is new in 12c, and to avoid the performance issue causing it add
hint like below
WHERE EXISTS (SELECT /*+ first_rows(1)*/* FROM Table WHERE TableColumn IN (...) FETCH NEXT 1 ROW ONLY)
try it and check its query plan
Note: I recommend to add indexes on table ACCT ,ACTV to enhance its performance.

Changing the orientation of a SplitView

I'm making a UWP (Windows 10) app. I'd like to know, is it possible to change the orientation of a SplitView? Typically, it's ordered like this:
______________________________________________
| | |
| | |
| | |
| | |
| | |
| Pane | Content |
| | |
| | |
| | |
| | |
| | |
----------------------------------------------
Is it possible to change the orientation to:
______________________________________________
| |
| |
| Pane |
| |
| |
| |
----------------------------------------------
| |
| |
| |
| |
| Content |
| |
| |
| |
----------------------------------------------

It is not supported by the platform (SplitVew.PanePlacement property can only be left or right).
You can likely achieve a somewhat similar affect by placing a command bar at the top of your application.

Inefficient SQL Search Query - Oracle DB

My logic in this query is right (well im 80% sure it is). but its been running for 2h 23min and still going, was wondering if some one could maybe help me make this run a bit more efficiently as i don't think its that intense of a query
SELECT b.bridge_no, COUNT(*) AS comment_cnt
FROM iacd_asset b INNER JOIN iacd_note c
ON REGEXP_LIKE(c.comments, '(^|\W)BN' || b.bridge_no || '(\W|$)', 'i')
inner join ncr_note e on c.note_id=e.note_id
inner join ncr f on e.ncr_id=f.ncr_id
inner join ncr_iac g on f.ncr_id=g.ncr_id
WHERE c.create_dt >= date'2015-01-01'
AND c.create_dt < date'2015-03-12'
AND length(b.bridge_no) > 1
AND g.scheme in (1, 3, 5, 6, 7, 8, 9, 9, and about 10 more values)
GROUP BY b.bridge_no
ORDER BY comment_cnt;
in short the query should be making a bunch of joins, and then filtering the joined table by schemes (g.scheme in....) , and then parsing the notes field for anything with BN in it.
PLAN TABLE, ok i have never used one before, but i believe this is the plan table
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| OPERATION | OPTIONS | OBJECT_OWNER | OBJECT_NAME | OBJECT_ALIAS | OBJECT_INSTANCE | OBJECT_TYPE | OPTIMIZER | ID | PARENT_ID | DEPTH | POSITION | COST | CARDINALITY | BYTES | CPU_COST | IO_COST | TEMP_SPACE | ACCESS_PREDICATES | FILTER_PREDICATES | PROJECTION | TIME | QBLOCK_NAME |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| SELECT STATEMENT | | | | | | | ALL_ROWS | 0 | | 0 | 281,503 | 281,503 | 40 | 4,480 | 148,378,917,975 | 215,677 | | | | | 458 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| SORT | ORDER BY | | | | | | | 1 | 0 | 1 | 1 | 281,503 | 40 | 4,480 | 148,378,917,975 | 215,677 | | | | (#keys=1) COUNT(*)[22], "B"."BRIDGE_NO"[NUMBER,22] | 458 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| HASH | GROUP BY | | | | | | | 2 | 1 | 2 | 1 | 281,503 | 40 | 4,480 | 148,378,917,975 | 215,677 | | | | (#keys=1) "B"."BRIDGE_NO"[NUMBER,22], COUNT(*)[22] | 458 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| HASH JOIN | | | | | | | | 3 | 2 | 3 | 1 | 281,497 | 16,084 | 1,801,408 | 148,366,537,976 | 215,677 | 24,126,000 | "G"."NCR_ID"="F"."NCR_ID" | | (#keys=1) "B"."BRIDGE_NO"[NUMBER,22] | 458 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| HASH JOIN | | | | | | | | 4 | 3 | 4 | 1 | 96,996 | 209,778 | 21,607,134 | 13,549,630,814 | 90,985 | 22,725,000 | "E"."NCR_ID"="F"."NCR_ID" | | (#keys=1) "F"."NCR_ID"[NUMBER,22], "B"."BRIDGE_NO"[NUMBER,22] | 158 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| HASH JOIN | | | | | | | | 5 | 4 | 5 | 1 | 42,595 | 208,419 | 20,216,643 | 5,484,063,163 | 40,162 | 9,839,000 | "C"."NOTE_ID"="E"."NOTE_ID" | REGEXP_LIKE ("C"."COMMENTS",'(^|\W)BN'||TO_CHAR("B"."BRIDGE_NO")||'(\W|$)','i') | (#keys=1) "B"."BRIDGE_NO"[NUMBER,22], "E"."NCR_ID"[NUMBER,22] | 70 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| PARTITION RANGE | SINGLE | | | | | | | 6 | 5 | 6 | 1 | 1,039 | 104,603 | 8,577,446 | 62,280,224 | 1,011 | | | | "C"."NOTE_ID"[NUMBER,22], "C"."COMMENTS"[VARCHAR2,4000] | 2 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| TABLE ACCESS | FULL | IACDB | IACD_NOTE | C#SEL$1 | 2 | TABLE | ANALYZED | 7 | 6 | 7 | 1 | 1,039 | 104,603 | 8,577,446 | 62,280,224 | 1,011 | | | "C"."CREATE_DATE"<TO_DATE(' 2014-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') | "C"."NOTE_ID"[NUMBER,22], "C"."COMMENTS"[VARCHAR2,4000] | 2 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| MERGE JOIN | CARTESIAN | | | | | | | 8 | 5 | 6 | 2 | 24,267 | 12,268,270 | 184,024,050 | 2,780,501,758 | 23,033 | | | | (#keys=0) "B"."BRIDGE_NO"[NUMBER,22], "E"."NCR_ID"[NUMBER,22], "E"."NOTE_ID"[NUMBER,22] | 40 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| TABLE ACCESS | FULL | IACDB | IACD_ASSET | B#SEL$1 | 1 | TABLE | ANALYZED | 9 | 8 | 7 | 1 | 7 | 40 | 160 | 560,542 | 7 | | | LENGTH(TO_CHAR("B"."BRIDGE_NO"))>1 | "B"."BRIDGE_NO"[NUMBER,22] | 1 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| BUFFER | SORT | | | | | | | 10 | 8 | 7 | 2 | 24,259 | 308,248 | 3,390,728 | 2,779,941,216 | 23,026 | | | | (#keys=0) "E"."NCR_ID"[NUMBER,22], "E"."NOTE_ID"[NUMBER,22] | 40 | |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| TABLE ACCESS | FULL | IACDB | IACD_NCR_NOTE | E#SEL$2 | 4 | TABLE | ANALYZED | 11 | 10 | 8 | 1 | 606 | 308,248 | 3,390,728 | 69,498,530 | 576 | | | | "E"."NCR_ID"[NUMBER,22], "E"."NOTE_ID"[NUMBER,22] | 1 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| INDEX | FAST FULL SCAN | IACDB | PK_IACDNCR_NCRID | F#SEL$3 | | INDEX (UNIQUE) | ANALYZED | 12 | 4 | 5 | 2 | 31,763 | 22,838,996 | 137,033,976 | 3,248,120,913 | 30,322 | | | | "F"."NCR_ID"[NUMBER,22] | 52 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
| TABLE ACCESS | FULL | IACDB | IACD_NCR_IAC | G#SEL$4 | 8 | TABLE | ANALYZED | 13 | 3 | 4 | 2 | 181,461 | 1,731,062 | 15,579,558 | 134,407,812,606 | 121,833 | | | ALL THE SCHEMES CHCECKS | "G"."NCR_ID"[NUMBER,22] | 295 | SEL$81719215 |
+------------------+----------------+--------------+------------------+--------------+-----------------+----------------+-----------+----+-----------+-------+----------+---------+-------------+-------------+-----------------+---------+------------+-----------------------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+------+--------------+
Hopefully thats legible enough
interms of indexes i assume only the fields that are being sorted is importent
crate_dt is not indexed
scheme id is indexed
Maybe my order in query is wrong...

The plan shows you're doing FULL TABLE SCAN of IACD_NOTE and IACD_ASSET, and then doing a CARETESIAN join of them, because you have provided no criteria for linking one record in IACD_ASSET to a set of records in IACD_NOTE.
That's not my definition of a non-intense query, and the eye-popping values for CPU cost bear that out.
You need to replace this ..,
FROM iacd_asset b INNER JOIN iacd_note c
ON REGEXP_LIKE(c.comments, '(^|\W)BN' || b.bridge_no || '(\W|$)', 'i')
... with an actual join on indexed columns. It would be helpful if Notes were linked to Assets by a foreign key of BRIDGE_NO or similar. I don't know your data model. Then you can use that regex as an additional filter in the WHERE clause.
Also you join to three further tables, to get to something which allows an additional filter on SCHEME. Again, I don't know your data model but this seems pretty inefficient.
Unfortunately this is the sort of tuning which relies on domain knowledge. Fixing this query requires understanding of the data - its volume, distribution and skew, the data model itself and the business logic your query implements. This is way beyond the scope of the advice we can offer in StackOverflow.
One thing to consider, but it is a big decision would be to index the comments with a free text index. However, that has lots of ramifications (especially space and database admin). Find out more.

Setting the position of a tab Scrollbar when updaing the VerticalScroll Value

I am updating a number of controls dynamically on a tabpage within a c# application. The tab page has autoscroll enabled. As you may be aware, when the user has scrolled down it sets the location x=0,y=0 to be at the current x=0,y=0.
so:
+-----------------------------------+
| X |---| X is now 0,0 in terms of setting locations.
| | - |
| |---|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| TEXT | |
| | |
+-----------------------------------+
AND
+-----------------------------------+
| X | | X is now 0,0 in terms of setting locations.
| TEXT | |
| | |
| | |
| | |
| |---|
| | - |
| |---|
| | |
| | |
| | |
| | |
+-----------------------------------+
Therefore I have to save the VerticalScroll.Value, set it to 0, update the controls and then set the VerticalScroll.Value back to the original value.
This works to a degree, the position of the tab form is correct, however the scrollbar doesn't update, so it looks like this:
+-----------------------------------+
| |---|
| TEXT | - |
| |---|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
+-----------------------------------+
Is there any easy way to update the position of the scrollbar?
If I leave it as it is, when the user clicks on the scrollbar again it moves the position back to the top of the tab form. Thanks for any help you can provide.

To answer my own question:
http://msdn.microsoft.com/en-us/library/system.windows.forms.scrollablecontrol.autoscrollposition.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What are segments in Lucene? - lucene

What are segments in Lucene? What are the benefits of segments?

Related

How to get the number of empty cells for each column in spark dataframe

Speeding up WHERE EXISTS clause by FETCH NEXT clause

Changing the orientation of a SplitView

Inefficient SQL Search Query - Oracle DB

Setting the position of a tab Scrollbar when updaing the VerticalScroll Value

Categories

Resources