perf: use `bytes.Index` for substring search #104

dkharms · 2025-09-01T14:30:30Z

Description

Here's the benchmarks for KMP and bytes.Index (I excluded comparison of heap allocation since both version had zero heap allocations).

goos: linux
goarch: amd64
pkg: github.com/ozontech/seq-db/pattern
cpu: 12th Gen Intel(R) Core(TM) i5-12600K
                                              │    kmp.txt    │              bytes.txt               │
                                              │    sec/op     │    sec/op     vs base                │
FindSequence_Deterministic/regular-cases-0-16   19.605n ±  7%   8.456n ±  1%  -56.87% (p=0.000 n=10)
FindSequence_Deterministic/regular-cases-1-16   14.305n ±  2%   5.590n ±  1%  -60.92% (p=0.000 n=10)
FindSequence_Deterministic/corner-cases-0-16     91.35n ±  3%   22.86n ±  1%  -74.98% (p=0.000 n=10)
FindSequence_Deterministic/corner-cases-1-16    167.70n ±  2%   28.44n ±  1%  -83.04% (p=0.000 n=10)
FindSequence_Deterministic/corner-cases-2-16    2482.5n ±  1%   181.8n ±  1%  -92.68% (p=0.000 n=10)
FindSequence_Deterministic/corner-cases-3-16    39.509µ ±  1%   2.663µ ±  1%  -93.26% (p=0.000 n=10)
FindSequence_Random/tiny-16                      63.56n ± 34%   12.10n ± 11%  -80.96% (p=0.000 n=10)
FindSequence_Random/small-16                    190.70n ±  8%   23.70n ± 39%  -87.57% (p=0.000 n=10)
FindSequence_Random/medium-16                   681.70n ± 25%   41.71n ± 28%  -93.88% (p=0.000 n=10)
FindSequence_Random/large-16                    9804.5n ±  2%   538.2n ±  7%  -94.51% (p=0.000 n=10)
FindSequence_Random/extra-large-16              603.54µ ±  1%   58.99µ ±  2%  -90.23% (p=0.000 n=10)
geomean                                          702.6n         94.94n        -86.49%

                                   │    kmp.txt    │                bytes.txt                 │
                                   │      B/s      │      B/s        vs base                  │
FindSequence_Random/tiny-16          960.4Mi ± 52%   5044.3Mi ± 12%   +425.25% (p=0.000 n=10)
FindSequence_Random/small-16         1.250Gi ±  9%   10.424Gi ± 33%   +733.58% (p=0.000 n=10)
FindSequence_Random/medium-16        1.402Gi ± 20%   22.882Gi ± 32%  +1531.58% (p=0.000 n=10)
FindSequence_Random/large-16         1.556Gi ±  2%   28.363Gi ±  7%  +1722.42% (p=0.000 n=10)
FindSequence_Random/extra-large-16   1.618Gi ±  1%   16.555Gi ±  2%   +923.15% (p=0.000 n=10)
geomean                              1.329Gi          14.07Gi         +959.02%

So we see that on regular cases we have around 53% of speedup by using bytes.Index. And on large strings we see around of 90% speedup (I believe that's because we build LPS for needles, not the haystack).

Here are features of my CPU (it supports both SSE4.2 and AVX2):

Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 **sse4_2** x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 **avx2** smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities

Closes #98

I have read and followed all requirements in CONTRIBUTING.md;
I used LLM/AI assistance to make this pull request;

If you have used LLM/AI assistance please provide model name and full prompt:

Model: {{model-name}}
Prompt: {{prompt}}

Summary by CodeRabbit

Refactor
- Switched substring handling to a leaner byte-oriented implementation for multi-part pattern searches, simplifying internals and improving search performance.
Tests
- Replaced legacy substring tests with new sequence-focused tests and added deterministic and randomized benchmarks plus data generators to strengthen validation and performance coverage.

codecov-commenter · 2025-09-01T14:35:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.23%. Comparing base (09d9579) to head (fe6da01).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
- Coverage   71.50%   71.23%   -0.27%     
==========================================
  Files         201      200       -1     
  Lines       18170    18143      -27     
==========================================
- Hits        12993    12925      -68     
- Misses       4457     4500      +43     
+ Partials      720      718       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai · 2025-09-01T14:37:22Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

📝 Walkthrough

Walkthrough

Replaces KMP-based substring matcher with a bytes.Index-based findSequence, removes pattern/substring.go and its tests, updates wildcardSearch.middle from []*substring to [][]byte (using util.StringToByteUnsafe), and adds new findSequence tests and benchmarks (test generator contains a loop bug).

Changes

Cohort / File(s)	Summary
Core algorithm & pattern update `pattern/substring.go` (deleted), `pattern/pattern.go`	Deleted KMP-based `substring` implementation and helpers; changed `wildcardSearch.middle` from `[]*substring` to `[][]byte`; replaced substring-pattern construction with `val := util.StringToByteUnsafe(terms[i].Data)` and `s.middle = append(s.middle, val)`; adjusted `middleLen` to use `len(val)`; added `findSequence(haystack []byte, needles [][]byte) int` implemented with `bytes.Index`. Introduces unsafe string→[]byte conversion.
Tests & benchmarks `pattern/substring_test.go` (deleted), `pattern/pattern_test.go`	Removed all KMP-specific unit tests. Added `TestFindSequence`, `BenchmarkFindSequence_Deterministic`, `BenchmarkFindSequence_Random`, and test/benchmark helpers (`generateTestData`, `generateRandomBytes`, `bb`). Note: `generateTestData` currently uses `for i := range needleCount` (invalid) and will not compile as-is.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks (4 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “perf: use `bytes.Index` for substring search” precisely and concisely captures the primary change of replacing the custom KMP implementation with the standard library’s bytes.Index for improved performance. It highlights the key optimization without extraneous details, enabling a quick understanding of the PR’s focus. This aligns directly with the PR objectives around performance.
Linked Issues Check	✅ Passed	The changes fully satisfy issue #98 by removing the KMP-based substring logic in pattern/substring.go, updating pattern/pattern.go to use bytes.Index in findSequence, and adding targeted tests and benchmarks to validate and measure the new implementation’s performance. All modifications directly implement the linked issue’s objective of leveraging vectorized instructions and the Rabin–Karp fallback in bytes.Index. There are no outstanding requirements from the issue left unaddressed.
Out of Scope Changes Check	✅ Passed	All file changes pertain solely to replacing the KMP algorithm with bytes.Index, removing the old substring abstraction and associated tests, and adding new findSequence tests and benchmarks; no unrelated features or files are modified. Every change maps back to the performance objective defined in the linked issue. There are no extraneous or out-of-scope alterations.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 98-bytes-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

pattern/substring.go (1)
7-16: Tighten the loop and avoid double indexing.

Minor cleanup and a tiny win: range with index+value and use val for bytes.Index. No behavior change.
-func findSequence(haystack []byte, needles [][]byte) int {
-	for cur := range needles {
-		val := needles[cur]
-		start := bytes.Index(haystack, needles[cur])
-		if start == -1 {
-			return cur
-		}
-		haystack = haystack[start+len(val):]
-	}
-	return len(needles)
-}
+func findSequence(haystack []byte, needles [][]byte) int {
+	for i, val := range needles {
+		start := bytes.Index(haystack, val)
+		if start == -1 {
+			return i
+		}
+		haystack = haystack[start+len(val):]
+	}
+	return len(needles)
+}
Do we ever expect empty needles? If yes, current logic treats them as “found” without advancing. If that’s undesirable, we should skip/forbid empties and add a unit test.
pattern/pattern.go (3)
78-81: Unsafe aliasing needs lifetime pinning (keep original strings alive).

middle now holds [][]byte produced via an unsafe string→[]byte view. To avoid GC lifetime hazards and future accidental mutation, keep the original strings alongside the byte slices.
 type wildcardSearch struct {
   baseSearch
   prefix    []byte
   suffix    []byte
-  middle    [][]byte
+  middle    [][]byte
+  // keep original strings to pin GC lifetime when using unsafe conversions
+  middleKeepAlive []string
   middleLen int
   narrowed  bool
 }
99-102: Pin the source strings when using StringToByteUnsafe.

Store the source string to ensure it stays alive for the duration of the search. Also document the read-only invariant.
-      val := util.StringToByteUnsafe(terms[i].Data)
-      s.middle = append(s.middle, val)
-      s.middleLen += len(val)
+      str := terms[i].Data
+      val := util.StringToByteUnsafe(str) // read-only view; do not mutate
+      s.middle = append(s.middle, val)
+      s.middleKeepAlive = append(s.middleKeepAlive, str) // pin lifetime
+      s.middleLen += len(val)
If avoiding any unsafe GC nuance is preferred, switch to a safe copy:
-      val := util.StringToByteUnsafe(terms[i].Data)
+      val := []byte(terms[i].Data) // safe copy
151-152: Guard slice bounds locally (defensive check).

checkMiddle guards length globally, but a local bound check keeps findSequence call obviously safe and future-proof if conditions change.
-  return findSequence(val[len(s.prefix):len(val)-len(s.suffix)], s.middle) == len(s.middle)
+  lo, hi := len(s.prefix), len(val)-len(s.suffix)
+  if hi < lo {
+    return false
+  }
+  return findSequence(val[lo:hi], s.middle) == len(s.middle)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 035c094 and 2b9cefd.

📒 Files selected for processing (3)

pattern/pattern.go (2 hunks)
pattern/substring.go (1 hunks)
pattern/substring_test.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (3)

pattern/substring_test.go (3)

24-40: Tests look good.

Nice coverage on order, overlaps, and corner growth cases.

125-131: LGTM.

Simple and fast random byte generator.

133-135: LGTM.

Nice helper for readability in deterministic cases.

pattern/substring_test.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

pattern/substring_test.go (3)

72-79: Use the canonical benchmark loop; drop redundant conversion.

Replace b.Loop() and []byte(c.haystack).

-      b.Run(tc.name+"-"+strconv.Itoa(i), func(b *testing.B) {
-        for b.Loop() {
-          findSequence([]byte(c.haystack), c.needles)
-        }
-      })
+      b.Run(tc.name+"-"+strconv.Itoa(i), func(b *testing.B) {
+        b.ReportAllocs()
+        for i := 0; i < b.N; i++ {
+          findSequence(c.haystack, c.needles)
+        }
+      })

97-108: Stabilize random bench; set bytes once.

Move SetBytes out of the loop and use the canonical loop.

   b.Run(size.name, func(b *testing.B) {
     haystack, needles := generateTestData(
       size.haystackSize, size.needleSize, size.needleCount, 256,
     )
-    b.ResetTimer()
-    for b.Loop() {
-      findSequence(haystack, needles)
-      b.SetBytes(int64(len(haystack)))
-    }
+    b.ReportAllocs()
+    b.SetBytes(int64(len(haystack)))
+    b.ResetTimer()
+    for i := 0; i < b.N; i++ {
+      findSequence(haystack, needles)
+    }
   })

111-123: Fix compile error and bound checks in data generator.

Looping over int with range doesn’t compile; rand.Intn(len(haystack)-needleSize) can panic; copy must be bounded.

 func generateTestData(haystackSize, needleSize, needleCount, charset int) ([]byte, [][]byte) {
   haystack := generateRandomBytes(haystackSize, charset)

   needles := make([][]byte, needleCount)
-  for i := range needleCount {
-    pattern := generateRandomBytes(needleSize, charset)
-    pos := rand.Intn(len(haystack) - needleSize)
-    copy(haystack[pos:], pattern)
-    needles[i] = pattern
-  }
+  if needleSize > haystackSize {
+    needleSize = haystackSize
+  }
+  for i := 0; i < needleCount; i++ {
+    pattern := generateRandomBytes(needleSize, charset)
+    if needleSize > 0 {
+      pos := rand.Intn(len(haystack)-needleSize+1)
+      copy(haystack[pos:pos+needleSize], pattern)
+    }
+    needles[i] = pattern
+  }

   return haystack, needles
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9cefd and abb9bb5.

📒 Files selected for processing (3)

pattern/pattern.go (2 hunks)
pattern/substring.go (1 hunks)
pattern/substring_test.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pattern/substring.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (1)

pattern/pattern.go (1)
342-349: Ignore empty-range wrap-around risk
baseSearch.first is int(tp.FirstTID()) (≥1), and util.BinSearchInRange never returns below that, so on “not found”
s.last = s.first - 1  
yields s.last ≥ 0. Casting these non-negative ints to uint32 can’t wrap.

Likely an incorrect or invalid review comment.

pattern/pattern.go

pattern/substring.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

pattern/pattern_test.go (3)

487-515: Good new coverage; consider a couple more edge tests.

Add cases for empty haystack and/or empty needles to lock in boundary behavior (e.g., [], ["a"], [""]).

517-556: Nit: avoid redundant []byte conversion; optionally set bytes once.

c.haystack is already []byte. Also it helps to set bytes outside the loop.

Apply this diff:

-            b.Run(tc.name+"-"+strconv.Itoa(i), func(b *testing.B) {
-                for b.Loop() {
-                    findSequence([]byte(c.haystack), c.needles)
-                }
-            })
+            b.Run(tc.name+"-"+strconv.Itoa(i), func(b *testing.B) {
+                b.SetBytes(int64(len(c.haystack)))
+                for b.Loop() {
+                    findSequence(c.haystack, c.needles)
+                }
+            })

558-584: SetBytes should be outside the loop.

It’s constant per sub-benchmark; moving it reduces per-iter overhead.

Apply this diff:

-        b.Run(size.name, func(b *testing.B) {
+        b.Run(size.name, func(b *testing.B) {
             haystack, needles := generateTestData(
                 size.haystackSize, size.needleSize, size.needleCount, 256,
             )
-            b.ResetTimer()
+            b.SetBytes(int64(len(haystack)))
+            b.ResetTimer()
             for b.Loop() {
                 findSequence(haystack, needles)
-                b.SetBytes(int64(len(haystack)))
             }
         })

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb9bb5 and c3b99d1.

📒 Files selected for processing (4)

pattern/pattern.go (3 hunks)
pattern/pattern_test.go (2 hunks)
pattern/substring.go (0 hunks)
pattern/substring_test.go (0 hunks)

💤 Files with no reviewable changes (2)

pattern/substring_test.go
pattern/substring.go

🚧 Files skipped from review as they are similar to previous changes (1)

pattern/pattern.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (4)

pattern/pattern_test.go (4)

9-9: Import looks correct.

Needed for strings.Repeat in tests/benchmarks.

600-607: LGTM.

Simple, deterministic byte generator for the given charset.

608-610: LGTM.

Handy helper to keep benchmarks tidy.

517-556: No action needed: go.mod and CI workflows already require Go 1.24+

pattern/pattern_test.go

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pattern/pattern_test.go (1)

586-598: Fix compile error and off-by-one; guard oversized needles in generateTestData.

Looping over an int won’t compile; current pos calc can panic and excludes the last valid start.

 func generateTestData(haystackSize, needleSize, needleCount, charset int) ([]byte, [][]byte) {
 	haystack := generateRandomBytes(haystackSize, charset)
 
 	needles := make([][]byte, needleCount)
-	for i := range needleCount {
-		pattern := generateRandomBytes(needleSize, charset)
-		pos := rand.Intn(len(haystack) - needleSize)
+	for i := 0; i < needleCount; i++ {
+		plen := needleSize
+		if plen > len(haystack) {
+			plen = len(haystack)
+		}
+		pattern := generateRandomBytes(plen, charset)
+		maxStart := len(haystack) - plen
+		pos := 0
+		if maxStart > 0 {
+			pos = rand.Intn(maxStart + 1)
+		}
 		copy(haystack[pos:], pattern)
 		needles[i] = pattern
 	}
 
 	return haystack, needles
 }

🧹 Nitpick comments (1)

pattern/pattern_test.go (1)

487-494: Preallocate needlesB to avoid slice growth (tiny test micro-opt).

Pre-sizing eliminates reallocations in the helper.

-func testFindSequence(a *assert.Assertions, cnt int, needles []string, haystack string) {
-	var needlesB [][]byte
-	for _, needle := range needles {
-		needlesB = append(needlesB, []byte(needle))
-	}
+func testFindSequence(a *assert.Assertions, cnt int, needles []string, haystack string) {
+	needlesB := make([][]byte, len(needles))
+	for i, needle := range needles {
+		needlesB[i] = []byte(needle)
+	}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3b99d1 and 258a412.

📒 Files selected for processing (4)

pattern/pattern.go (3 hunks)
pattern/pattern_test.go (2 hunks)
pattern/substring.go (0 hunks)
pattern/substring_test.go (0 hunks)

💤 Files with no reviewable changes (2)

pattern/substring_test.go
pattern/substring.go

🚧 Files skipped from review as they are similar to previous changes (1)

pattern/pattern.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

pattern/pattern_test.go

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

pattern/pattern_test.go (3)

517-556: Fix non-existent testing.B.Loop and drop redundant []byte cast; set bytes once.

This won’t compile (b.Loop doesn’t exist). Also []byte(c.haystack) is redundant; c.haystack is already []byte.

 for _, tc := range testCases {
   for i, c := range tc.cases {
     b.Run(tc.name+"-"+strconv.Itoa(i), func(b *testing.B) {
-      for b.Loop() {
-        findSequence([]byte(c.haystack), c.needles)
-      }
+      b.SetBytes(int64(len(c.haystack)))
+      b.ResetTimer()
+      for i := 0; i < b.N; i++ {
+        findSequence(c.haystack, c.needles)
+      }
     })
   }
 }

558-584: Fix benchmark loop and SetBytes placement.

Use the standard b.N loop; move SetBytes outside the per-iteration body.

 b.Run(size.name, func(b *testing.B) {
   haystack, needles := generateTestData(
     size.haystackSize, size.needleSize, size.needleCount, 256,
   )
-  b.ResetTimer()
-  for b.Loop() {
-    findSequence(haystack, needles)
-    b.SetBytes(int64(len(haystack)))
-  }
+  b.SetBytes(int64(len(haystack)))
+  b.ResetTimer()
+  for i := 0; i < b.N; i++ {
+    findSequence(haystack, needles)
+  }
 })

586-598: Fix compile error, off-by-one, and guard oversize needles.

for i := range needleCount won’t compile (cannot range over int).
rand.Intn(len(haystack)-needleSize) can panic and excludes the last valid start.
Allow needles longer than haystack by clamping length.

 func generateTestData(haystackSize, needleSize, needleCount, charset int) ([]byte, [][]byte) {
   haystack := generateRandomBytes(haystackSize, charset)

   needles := make([][]byte, needleCount)
-  for i := range needleCount {
-    pattern := generateRandomBytes(needleSize, charset)
-    pos := rand.Intn(len(haystack) - needleSize)
+  for i := 0; i < needleCount; i++ {
+    plen := needleSize
+    if plen > len(haystack) {
+      plen = len(haystack)
+    }
+    pattern := generateRandomBytes(plen, charset)
+    maxStart := len(haystack) - plen
+    pos := 0
+    if maxStart > 0 {
+      pos = rand.Intn(maxStart + 1) // include last valid index
+    }
     copy(haystack[pos:], pattern)
     needles[i] = pattern
   }

   return haystack, needles
 }

🧹 Nitpick comments (1)

pattern/pattern_test.go (1)

487-495: Helper looks good; preallocate slice capacity.

Minor nit: avoid reallocations when building needlesB.

 func testFindSequence(a *assert.Assertions, cnt int, needles []string, haystack string) {
-	var needlesB [][]byte
+	needlesB := make([][]byte, 0, len(needles))
 	for _, needle := range needles {
 		needlesB = append(needlesB, []byte(needle))
 	}
 	res := findSequence([]byte(haystack), needlesB)
 	a.Equal(cnt, res, "wrong total number of matches")
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 258a412 and 098bcb7.

📒 Files selected for processing (4)

pattern/pattern.go (3 hunks)
pattern/pattern_test.go (2 hunks)
pattern/substring.go (0 hunks)
pattern/substring_test.go (0 hunks)

💤 Files with no reviewable changes (2)

pattern/substring_test.go
pattern/substring.go

🚧 Files skipped from review as they are similar to previous changes (1)

pattern/pattern.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (4)

pattern/pattern_test.go (4)

9-9: Import of strings is appropriate.

Used by strings.Repeat in tests/benchmarks. LGTM.

496-515: Tests cover useful cases.

Nice spread of scenarios; no issues spotted.

600-606: LGTM.

Simple, efficient generator for benchmarks.

608-610: LGTM.

Tiny helper keeps test cases readable.

dkharms · 2025-09-11T11:21:18Z

@coderabbitai pause

coderabbitai · 2025-09-11T11:21:26Z

✅ Actions performed

Reviews paused.

dkharms self-assigned this Sep 1, 2025

dkharms added benchmarks Features or improvements over our benchmark processes performance Features or improvements that positively affect seq-db performance labels Sep 1, 2025

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

pattern/substring_test.go Outdated Show resolved Hide resolved

pattern/substring_test.go Outdated Show resolved Hide resolved

pattern/substring_test.go Outdated Show resolved Hide resolved

dkharms force-pushed the 98-bytes-index branch from 2b9cefd to abb9bb5 Compare September 2, 2025 17:10

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

pattern/pattern.go Show resolved Hide resolved

ssnd requested review from moflotas and eguguchkin September 8, 2025 10:44

eguguchkin approved these changes Sep 9, 2025

View reviewed changes

pattern/substring.go Outdated Show resolved Hide resolved

pattern/substring.go Outdated Show resolved Hide resolved

eguguchkin reviewed Sep 9, 2025

View reviewed changes

pattern/substring.go Outdated Show resolved Hide resolved

dkharms force-pushed the 98-bytes-index branch from abb9bb5 to c3b99d1 Compare September 10, 2025 08:47

dkharms enabled auto-merge (squash) September 10, 2025 08:49

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

pattern/pattern_test.go Show resolved Hide resolved

dkharms force-pushed the 98-bytes-index branch from c3b99d1 to 258a412 Compare September 10, 2025 10:04

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

pattern/pattern_test.go Show resolved Hide resolved

pattern/pattern_test.go Show resolved Hide resolved

dkharms force-pushed the 98-bytes-index branch from 258a412 to 098bcb7 Compare September 10, 2025 13:48

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

dkharms added 2 commits September 11, 2025 14:23

perf: use bytes.Index for substring search

b8d46b0

chore: remove substring file

fe6da01

dkharms force-pushed the 98-bytes-index branch from 098bcb7 to fe6da01 Compare September 11, 2025 11:23

perf: use bytes.Index for substring search #104

Are you sure you want to change the base?

perf: use bytes.Index for substring search #104

Uh oh!

Conversation

dkharms commented Sep 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

codecov-commenter commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Pre-merge checks (4 passed, 1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dkharms commented Sep 11, 2025

Uh oh!

coderabbitai bot commented Sep 11, 2025

Uh oh!

Uh oh!

perf: use `bytes.Index` for substring search #104

perf: use `bytes.Index` for substring search #104

dkharms commented Sep 1, 2025 •

edited by coderabbitai bot

Loading

codecov-commenter commented Sep 1, 2025 •

edited

Loading

coderabbitai bot commented Sep 1, 2025 •

edited

Loading