aboutsummaryrefslogtreecommitdiff
path: root/icing/index/iterator/doc-hit-info-iterator-section-restrict.h
diff options
context:
space:
mode:
authorJiayu Hu <hujiayu@google.com>2023-11-30 14:10:57 -0800
committerJiayu Hu <hujiayu@google.com>2023-11-30 18:24:06 -0800
commitcb6ac3ede1d2ad050895b588417ea353c75953fe (patch)
tree431aae8a813d3b3e229077175c1363901b870d53 /icing/index/iterator/doc-hit-info-iterator-section-restrict.h
parentbe04186537a2e78ef1f27ba646676133d7e83c9a (diff)
downloadicing-cb6ac3ede1d2ad050895b588417ea353c75953fe.tar.gz
Update Icing from upstream.
Descriptions: ======================================================================== [Icing][version 3] Bump kVersion to 3 ======================================================================== Make lite index magic dependent on `IcingSearchEngineOptions::build_property_existence_metadata_hits` ======================================================================== Add a flag in IcingSearchEngineOptions to control whether to build property existence metadata hits ======================================================================== Support `hasProperty(property_path)` in the advanced query language ======================================================================== Add PropertyExistenceIndexingHandler to index property existence metadata hit ======================================================================== [JoinIndex Improvement][11/x] Add IcingSearchEngine initialization unit test for switching join index ======================================================================== [JoinIndex Improvement][10/x] Change/Add IcingSearchEngine unit tests ======================================================================== [JoinIndex Improvement][9/x] Integrate QualifiedIdJoinIndexImplV2 with IcingSearchEngine ======================================================================== [JoinIndex Improvement][8/x] Integrate QualifiedIdJoinIndexImplV2 with JoinProcessor ======================================================================== [JoinIndex Improvement][8/x] Integrate QualifiedIdJoinIndexImplV2 with QualifiedIdJoinIndexingHandler ======================================================================== [JoinIndex Improvement][7/x] Create QualifiedIdJoinIndex interface ======================================================================== [JoinIndex Improvement][6.1/x] Unit test (Optimize) ======================================================================== [JoinIndex Improvement][6.0/x] Unit test (General, Put, GetIterator) ======================================================================== [JoinIndex Improvement][5.3/x] Implement Optimize ======================================================================== Remove accents from Greek letters in normalizer ======================================================================== Make arm emulator tests build-only. ======================================================================== [JoinIndex Improvement][5.2/x] Implement GetIterator ======================================================================== [JoinIndex Improvement][5.1/x] Implement Put ======================================================================== [JoinIndex Improvement][5.0/x] Branch QualifiedIdJoinIndex to QualifiedIdJoinIndexImplV2 ======================================================================== [JoinIndex Improvement][4/x] Implement PostingListJoinDataAccessor ======================================================================== [JoinIndex Improvement][3/x] Implement PostingListJoinDataSerializer and DocumentIdToJoinInfo data type ======================================================================== [JoinIndex Improvement][2/x] Create NamespaceFingerprintIdentifier ======================================================================== [JoinIndex Improvement][1/x] Implement namespace_id_old_to_new in Compaction ======================================================================== Update test to also handle ICU 74 segmentation rules. ======================================================================== [Icing][Expand QueryStats][3/x] Add new fields into QueryStats (1) ======================================================================== [Icing][Expand QueryStats][2/x] Refactor QueryStatsProto ======================================================================== [Icing][Expand QueryStats][1/x] Publish DocHitInfoIterator CallStats ======================================================================== Add additional property filter tests ======================================================================== Deprecate hit_intersect_section_ids_mask in DocHitInfoIterator ======================================================================== Change default requires_full_emulation to False for portable_cc_test (third_party/icing/testing) ======================================================================== Cleanup Set requires_full_emulation to True for selective tests ======================================================================== Fix monkey test failures ======================================================================== Complete monkey test logic to change schema during monkey test runtime ======================================================================== Refactor monkey test to prepare for schema update ======================================================================== Fix the schema bug found by monkey test with seed 2551429844 ======================================================================== Move set query stats to the very top of InternalSearch() ======================================================================== Apply section restriction only on leaf nodes ======================================================================== [6/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Advanced query parser) ======================================================================== [5/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PersistentHashMap) ======================================================================== [4/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PostingListIntegerIndexSerializer) ======================================================================== [3/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (PostingListHitSerializer) ======================================================================== [2/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Posting list storage) ======================================================================== [1/n] Fix callsites in Icing that forgot to check libtextclassifier3::Status (Non-functional changes) ======================================================================== Decouple section restriction data from iterators ======================================================================== Fix the crash when a schema type gets more indexable properties than allowed ======================================================================== Add a checker to verify the property data type matches the schema. ======================================================================== Change global std::string in i18n-utils to constexpr std::string_view. ======================================================================== Adjust LiteIndex sort at indexing check conditions. ======================================================================== Bug: 305098009 Bug: 307508735 Bug: 291130542 Bug: 275121148 Bug: 303239901 Bug: 301116242 Bug: 299321977 Bug: 300135897 Bug: 297549761 Bug: 309826655 Bug: 296349369 Bug: 302192690 Bug: 302609704 Bug: 301566713 NO_IFTTT="False Alarm: The path is only valid in G3. kVersion is changed to 3, and schema is compatible with version 1." Change-Id: I8c4c3cd9b93e5240bd774f0a3d6d812f7a9ec198
Diffstat (limited to 'icing/index/iterator/doc-hit-info-iterator-section-restrict.h')
-rw-r--r--icing/index/iterator/doc-hit-info-iterator-section-restrict.h100
1 files changed, 37 insertions, 63 deletions
diff --git a/icing/index/iterator/doc-hit-info-iterator-section-restrict.h b/icing/index/iterator/doc-hit-info-iterator-section-restrict.h
index 5d44ed7..387ff52 100644
--- a/icing/index/iterator/doc-hit-info-iterator-section-restrict.h
+++ b/icing/index/iterator/doc-hit-info-iterator-section-restrict.h
@@ -17,15 +17,18 @@
#include <cstdint>
#include <memory>
+#include <set>
#include <string>
-#include <string_view>
-#include <unordered_map>
+#include <vector>
#include "icing/text_classifier/lib3/utils/base/status.h"
+#include "icing/text_classifier/lib3/utils/base/statusor.h"
#include "icing/index/iterator/doc-hit-info-iterator.h"
+#include "icing/index/iterator/section-restrict-data.h"
+#include "icing/proto/search.pb.h"
#include "icing/schema/schema-store.h"
#include "icing/schema/section.h"
-#include "icing/store/document-filter-data.h"
+#include "icing/store/document-id.h"
#include "icing/store/document-store.h"
namespace icing {
@@ -38,36 +41,48 @@ namespace lib {
// That class is meant to be applied to the root of a query tree and filter over
// all results at the end. This class is more used in the limited scope of a
// term or a small group of terms.
-class DocHitInfoIteratorSectionRestrict : public DocHitInfoIterator {
+class DocHitInfoIteratorSectionRestrict : public DocHitInfoLeafIterator {
public:
// Does not take any ownership, and all pointers must refer to valid objects
// that outlive the one constructed.
explicit DocHitInfoIteratorSectionRestrict(
- std::unique_ptr<DocHitInfoIterator> delegate,
+ std::unique_ptr<DocHitInfoIterator> delegate, SectionRestrictData* data);
+
+ // Methods that apply section restrictions to all DocHitInfoLeafIterator nodes
+ // inside the provided iterator tree, and return the root of the tree
+ // afterwards. These methods do not take any ownership for the raw pointer
+ // parameters, which must refer to valid objects that outlive the iterator
+ // returned.
+ static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions(
+ std::unique_ptr<DocHitInfoIterator> iterator,
const DocumentStore* document_store, const SchemaStore* schema_store,
std::set<std::string> target_sections, int64_t current_time_ms);
-
- explicit DocHitInfoIteratorSectionRestrict(
- std::unique_ptr<DocHitInfoIterator> delegate,
+ static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions(
+ std::unique_ptr<DocHitInfoIterator> iterator,
const DocumentStore* document_store, const SchemaStore* schema_store,
- const SearchSpecProto& search_spec,
- int64_t current_time_ms);
+ const SearchSpecProto& search_spec, int64_t current_time_ms);
+ static std::unique_ptr<DocHitInfoIterator> ApplyRestrictions(
+ std::unique_ptr<DocHitInfoIterator> iterator, SectionRestrictData* data);
libtextclassifier3::Status Advance() override;
libtextclassifier3::StatusOr<TrimmedNode> TrimRightMostNode() && override;
- int32_t GetNumBlocksInspected() const override;
-
- int32_t GetNumLeafAdvanceCalls() const override;
+ CallStats GetCallStats() const override { return delegate_->GetCallStats(); }
std::string ToString() const override;
- // Note that the DocHitInfoIteratorSectionRestrict is the only iterator that
- // should set filtering_section_mask, hence the received
- // filtering_section_mask is ignored and the filtering_section_mask passed to
- // the delegate will be set to hit_intersect_section_ids_mask_. This will
- // allow to filter the matching sections in the delegate.
+ // Note that the DocHitInfoIteratorSectionRestrict can only be applied at
+ // DocHitInfoLeafIterator, which can be a term iterator or another
+ // DocHitInfoIteratorSectionRestrict.
+ //
+ // To filter the matching sections, filtering_section_mask should be set to
+ // doc_hit_info_.hit_section_ids_mask() held in the outermost
+ // DocHitInfoIteratorSectionRestrict, which is equal to the intersection of
+ // all hit_section_ids_mask in the DocHitInfoIteratorSectionRestrict chain,
+ // since for any two section restrict iterators chained together, the outer
+ // one's hit_section_ids_mask is always a subset of the inner one's
+ // hit_section_ids_mask.
void PopulateMatchedTermsStats(
std::vector<TermMatchInfo>* matched_terms_stats,
SectionIdMask filtering_section_mask = kSectionIdMaskAll) const override {
@@ -77,55 +92,14 @@ class DocHitInfoIteratorSectionRestrict : public DocHitInfoIterator {
}
delegate_->PopulateMatchedTermsStats(
matched_terms_stats,
- /*filtering_section_mask=*/hit_intersect_section_ids_mask_);
+ /*filtering_section_mask=*/filtering_section_mask &
+ doc_hit_info_.hit_section_ids_mask());
}
private:
- explicit DocHitInfoIteratorSectionRestrict(
- std::unique_ptr<DocHitInfoIterator> delegate,
- const DocumentStore* document_store, const SchemaStore* schema_store,
- std::unordered_map<std::string, std::set<std::string>>
- type_property_filters,
- std::unordered_map<std::string, SectionIdMask> type_property_masks,
- int64_t current_time_ms);
- // Calculates the section mask of allowed sections(determined by the property
- // filters map) for the given schema type and caches the same for any future
- // calls.
- //
- // Returns:
- // - If type_property_filters_ has an entry for the given schema type or
- // wildcard(*), return a bitwise or of section IDs in the schema type that
- // that are also present in the relevant filter list.
- // - Otherwise, return kSectionIdMaskAll.
- SectionIdMask ComputeAndCacheSchemaTypeAllowedSectionsMask(
- const std::string& schema_type);
- // Generates a section mask for the given schema type and the target sections.
- //
- // Returns:
- // - A bitwise or of section IDs in the schema_type that that are also
- // present in the target_sections list.
- // - If none of the sections in the schema_type are present in the
- // target_sections list, return kSectionIdMaskNone.
- // This is done by doing a bitwise or of the target section ids for the given
- // schema type.
- SectionIdMask GenerateSectionMask(const std::string& schema_type,
- const std::set<std::string>&
- target_sections) const;
-
std::unique_ptr<DocHitInfoIterator> delegate_;
- const DocumentStore& document_store_;
- const SchemaStore& schema_store_;
- int64_t current_time_ms_;
-
- // Map of property filters per schema type. Supports wildcard(*) for schema
- // type that will apply to all schema types that are not specifically
- // specified in the mapping otherwise.
- std::unordered_map<std::string, std::set<std::string>>
- type_property_filters_;
- // Mapping of schema type to the section mask of allowed sections for that
- // schema type. This section mask is lazily calculated based on the specified
- // property filters and cached for any future use.
- std::unordered_map<std::string, SectionIdMask> type_property_masks_;
+ // Does not own.
+ SectionRestrictData* data_;
};
} // namespace lib