diff options
author | Jiayu Hu <hujiayu@google.com> | 2023-11-30 14:10:57 -0800 |
---|---|---|
committer | Jiayu Hu <hujiayu@google.com> | 2023-11-30 18:24:06 -0800 |
commit | cb6ac3ede1d2ad050895b588417ea353c75953fe (patch) | |
tree | 431aae8a813d3b3e229077175c1363901b870d53 /icing/index/iterator/doc-hit-info-iterator-section-restrict.h | |
parent | be04186537a2e78ef1f27ba646676133d7e83c9a (diff) | |
download | icing-cb6ac3ede1d2ad050895b588417ea353c75953fe.tar.gz |
Update Icing from upstream.
Descriptions:
========================================================================
[Icing][version 3] Bump kVersion to 3
========================================================================
Make lite index magic dependent on `IcingSearchEngineOptions::build_property_existence_metadata_hits`
========================================================================
Add a flag in IcingSearchEngineOptions to control whether to build property existence metadata hits
========================================================================
Support `hasProperty(property_path)` in the advanced query language
========================================================================
Add PropertyExistenceIndexingHandler to index property existence metadata hit
========================================================================
[JoinIndex Improvement][11/x] Add IcingSearchEngine initialization unit test for switching join index
========================================================================
[JoinIndex Improvement][10/x] Change/Add IcingSearchEngine unit tests
========================================================================
[JoinIndex Improvement][9/x] Integrate QualifiedIdJoinIndexImplV2 with IcingSearchEngine
========================================================================
[JoinIndex Improvement][8/x] Integrate QualifiedIdJoinIndexImplV2 with JoinProcessor
========================================================================
[JoinIndex Improvement][8/x] Integrate QualifiedIdJoinIndexImplV2 with QualifiedIdJoinIndexingHandler
========================================================================
[JoinIndex Improvement][7/x] Create QualifiedIdJoinIndex interface
========================================================================
[JoinIndex Improvement][6.1/x] Unit test (Optimize)
========================================================================
[JoinIndex Improvement][6.0/x] Unit test (General, Put, GetIterator)
========================================================================
[JoinIndex Improvement][5.3/x] Implement Optimize
========================================================================
Remove accents from Greek letters in normalizer
========================================================================
Make arm emulator tests build-only.
========================================================================
[JoinIndex Improvement][5.2/x] Implement GetIterator
========================================================================
[JoinIndex Improvement][5.1/x] Implement Put
========================================================================
[JoinIndex Improvement][5.0/x] Branch QualifiedIdJoinIndex to QualifiedIdJoinIndexImplV2
========================================================================
[JoinIndex Improvement][4/x] Implement PostingListJoinDataAccessor
========================================================================
[JoinIndex Improvement][3/x] Implement PostingListJoinDataSerializer and DocumentIdToJoinInfo data type
========================================================================
[JoinIndex Improvement][2/x] Create NamespaceFingerprintIdentifier
========================================================================
[JoinIndex Improvement][1/x] Implement namespace_id_old_to_new in Compaction
========================================================================
Update test to also handle ICU 74 segmentation rules.
========================================================================
[Icing][Expand QueryStats][3/x] Add new fields into QueryStats (1)
========================================================================
[Icing][Expand QueryStats][2/x] Refactor QueryStatsProto
========================================================================
[Icing][Expand QueryStats][1/x] Publish DocHitInfoIterator CallStats
========================================================================
Add additional property filter tests
========================================================================
Deprecate hit_intersect_section_ids_mask in DocHitInfoIterator
========================================================================
Change default requires_full_emulation to False for portable_cc_test (third_party/icing/testing)
========================================================================
Cleanup Set requires_full_emulation to True for selective tests
========================================================================
Fix monkey test failures
========================================================================
Complete monkey test logic to change schema during monkey test runtime
========================================================================
Refactor monkey test to prepare for schema update
========================================================================
Fix the schema bug found by monkey test with seed 2551429844
========================================================================
Move set query stats to the very top of InternalSearch()
========================================================================
Apply section restriction only on leaf nodes
========================================================================
[6/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Advanced query parser)
========================================================================
[5/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PersistentHashMap)
========================================================================
[4/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PostingListIntegerIndexSerializer)
========================================================================
[3/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PostingListHitSerializer)
========================================================================
[2/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Posting list storage)
========================================================================
[1/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Non-functional changes)
========================================================================
Decouple section restriction data from iterators
========================================================================
Fix the crash when a schema type gets more indexable properties than allowed
========================================================================
Add a checker to verify the property data type matches the schema.
========================================================================
Change global std::string in i18n-utils to constexpr std::string_view.
========================================================================
Adjust LiteIndex sort at indexing check conditions.
========================================================================
Bug: 305098009
Bug: 307508735
Bug: 291130542
Bug: 275121148
Bug: 303239901
Bug: 301116242
Bug: 299321977
Bug: 300135897
Bug: 297549761
Bug: 309826655
Bug: 296349369
Bug: 302192690
Bug: 302609704
Bug: 301566713
NO_IFTTT="False Alarm: The path is only valid in G3. kVersion is changed to 3, and schema is compatible with version 1."
Change-Id: I8c4c3cd9b93e5240bd774f0a3d6d812f7a9ec198
Diffstat (limited to 'icing/index/iterator/doc-hit-info-iterator-section-restrict.h')
-rw-r--r-- | icing/index/iterator/doc-hit-info-iterator-section-restrict.h | 100 |
1 files changed, 37 insertions, 63 deletions
diff --git a/icing/index/iterator/doc-hit-info-iterator-section-restrict.h b/icing/index/iterator/doc-hit-info-iterator-section-restrict.h index 5d44ed7..387ff52 100644 --- a/icing/index/iterator/doc-hit-info-iterator-section-restrict.h +++ b/icing/index/iterator/doc-hit-info-iterator-section-restrict.h @@ -17,15 +17,18 @@ #include <cstdint> #include <memory> +#include <set> #include <string> -#include <string_view> -#include <unordered_map> +#include <vector> #include "icing/text_classifier/lib3/utils/base/status.h" +#include "icing/text_classifier/lib3/utils/base/statusor.h" #include "icing/index/iterator/doc-hit-info-iterator.h" +#include "icing/index/iterator/section-restrict-data.h" +#include "icing/proto/search.pb.h" #include "icing/schema/schema-store.h" #include "icing/schema/section.h" -#include "icing/store/document-filter-data.h" +#include "icing/store/document-id.h" #include "icing/store/document-store.h" namespace icing { @@ -38,36 +41,48 @@ namespace lib { // That class is meant to be applied to the root of a query tree and filter over // all results at the end. This class is more used in the limited scope of a // term or a small group of terms. -class DocHitInfoIteratorSectionRestrict : public DocHitInfoIterator { +class DocHitInfoIteratorSectionRestrict : public DocHitInfoLeafIterator { public: // Does not take any ownership, and all pointers must refer to valid objects // that outlive the one constructed. explicit DocHitInfoIteratorSectionRestrict( - std::unique_ptr<DocHitInfoIterator> delegate, + std::unique_ptr<DocHitInfoIterator> delegate, SectionRestrictData* data); + + // Methods that apply section restrictions to all DocHitInfoLeafIterator nodes + // inside the provided iterator tree, and return the root of the tree + // afterwards. These methods do not take any ownership for the raw pointer + // parameters, which must refer to valid objects that outlive the iterator + // returned. + static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions( + std::unique_ptr<DocHitInfoIterator> iterator, const DocumentStore* document_store, const SchemaStore* schema_store, std::set<std::string> target_sections, int64_t current_time_ms); - - explicit DocHitInfoIteratorSectionRestrict( - std::unique_ptr<DocHitInfoIterator> delegate, + static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions( + std::unique_ptr<DocHitInfoIterator> iterator, const DocumentStore* document_store, const SchemaStore* schema_store, - const SearchSpecProto& search_spec, - int64_t current_time_ms); + const SearchSpecProto& search_spec, int64_t current_time_ms); + static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions( + std::unique_ptr<DocHitInfoIterator> iterator, SectionRestrictData* data); libtextclassifier3::Status Advance() override; libtextclassifier3::StatusOr<TrimmedNode> TrimRightMostNode() && override; - int32_t GetNumBlocksInspected() const override; - - int32_t GetNumLeafAdvanceCalls() const override; + CallStats GetCallStats() const override { return delegate_->GetCallStats(); } std::string ToString() const override; - // Note that the DocHitInfoIteratorSectionRestrict is the only iterator that - // should set filtering_section_mask, hence the received - // filtering_section_mask is ignored and the filtering_section_mask passed to - // the delegate will be set to hit_intersect_section_ids_mask_. This will - // allow to filter the matching sections in the delegate. + // Note that the DocHitInfoIteratorSectionRestrict can only be applied at + // DocHitInfoLeafIterator, which can be a term iterator or another + // DocHitInfoIteratorSectionRestrict. + // + // To filter the matching sections, filtering_section_mask should be set to + // doc_hit_info_.hit_section_ids_mask() held in the outermost + // DocHitInfoIteratorSectionRestrict, which is equal to the intersection of + // all hit_section_ids_mask in the DocHitInfoIteratorSectionRestrict chain, + // since for any two section restrict iterators chained together, the outer + // one's hit_section_ids_mask is always a subset of the inner one's + // hit_section_ids_mask. void PopulateMatchedTermsStats( std::vector<TermMatchInfo>* matched_terms_stats, SectionIdMask filtering_section_mask = kSectionIdMaskAll) const override { @@ -77,55 +92,14 @@ class DocHitInfoIteratorSectionRestrict : public DocHitInfoIterator { } delegate_->PopulateMatchedTermsStats( matched_terms_stats, - /*filtering_section_mask=*/hit_intersect_section_ids_mask_); + /*filtering_section_mask=*/filtering_section_mask & + doc_hit_info_.hit_section_ids_mask()); } private: - explicit DocHitInfoIteratorSectionRestrict( - std::unique_ptr<DocHitInfoIterator> delegate, - const DocumentStore* document_store, const SchemaStore* schema_store, - std::unordered_map<std::string, std::set<std::string>> - type_property_filters, - std::unordered_map<std::string, SectionIdMask> type_property_masks, - int64_t current_time_ms); - // Calculates the section mask of allowed sections(determined by the property - // filters map) for the given schema type and caches the same for any future - // calls. - // - // Returns: - // - If type_property_filters_ has an entry for the given schema type or - // wildcard(*), return a bitwise or of section IDs in the schema type that - // that are also present in the relevant filter list. - // - Otherwise, return kSectionIdMaskAll. - SectionIdMask ComputeAndCacheSchemaTypeAllowedSectionsMask( - const std::string& schema_type); - // Generates a section mask for the given schema type and the target sections. - // - // Returns: - // - A bitwise or of section IDs in the schema_type that that are also - // present in the target_sections list. - // - If none of the sections in the schema_type are present in the - // target_sections list, return kSectionIdMaskNone. - // This is done by doing a bitwise or of the target section ids for the given - // schema type. - SectionIdMask GenerateSectionMask(const std::string& schema_type, - const std::set<std::string>& - target_sections) const; - std::unique_ptr<DocHitInfoIterator> delegate_; - const DocumentStore& document_store_; - const SchemaStore& schema_store_; - int64_t current_time_ms_; - - // Map of property filters per schema type. Supports wildcard(*) for schema - // type that will apply to all schema types that are not specifically - // specified in the mapping otherwise. - std::unordered_map<std::string, std::set<std::string>> - type_property_filters_; - // Mapping of schema type to the section mask of allowed sections for that - // schema type. This section mask is lazily calculated based on the specified - // property filters and cached for any future use. - std::unordered_map<std::string, SectionIdMask> type_property_masks_; + // Does not own. + SectionRestrictData* data_; }; } // namespace lib |