- 
                Notifications
    
You must be signed in to change notification settings  - Fork 838
 
Introduce user scanner to Ruler/Alertmanager #6999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Introduce user scanner to Ruler/Alertmanager #6999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should extract user scanner into a dedicated package? pkg/storage/tsdb might not be the best place.
| 
           @yeya24  | 
    
          
 I am thinking about   | 
    
          
 ok, it looks good.  | 
    
e8c616c    to
    04acdd5      
    Compare
  
    ee1808c    to
    15f2ccd      
    Compare
  
    | 
           What is the difference for us between user and tenant? It is a bit weird to have   | 
    
| 
           @danielblando  | 
    
b0e778d    to
    22768c9      
    Compare
  
    | 
           @danielblando @yeya24  | 
    
fbfa1dd    to
    2953348      
    Compare
  
            
          
                pkg/util/errors/errors.go
              
                Outdated
          
        
      | } | ||
| } | ||
| 
               | 
          ||
| func IsOneOfTheExpectedErrors(f ...objstore.IsOpFailureExpectedFunc) objstore.IsOpFailureExpectedFunc { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this function is not part of the bucket package but in a generate pkg/util/errors? This seems specific to object store failures so I am not sure if it should be in the generic error file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved IsOneOfTheExpectedErrors to the bucket package. It causes a cyclic import if it is in the tsdb package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of using only the list strategy and only exposing cache_ttl as config?
| func NewBucketAlertStore(bkt objstore.InstrumentedBucket, userScannerCfg users.UsersScannerConfig, cfgProvider bucket.TenantConfigProvider, logger log.Logger, reg prometheus.Registerer) (*BucketAlertStore, error) { | ||
| alertBucket := bucket.NewPrefixedBucketClient(bkt, alertsPrefix) | ||
| 
               | 
          ||
| usersScanner, err := users.NewScanner(userScannerCfg, alertBucket, logger, extprom.WrapRegistererWith(prometheus.Labels{"component": "alertmanager"}, reg)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support user index based user scanner? Who will work as the user index updater for AM and Ruler bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the user index updater would also be necessary for Alertmanager and Ruler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for compactor, we use Ring to choose one compactor pod to periodically update the user index in the storage bucket. I think we need the same thing for Ruler/AM as they are using different S3 buckets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding clean_up_interval for the user index updater? Currently, it uses clean_up_interval in the Compactor.
| cortex_testutil "github.com/cortexproject/cortex/pkg/storage/tsdb/testutil" | ||
| "github.com/cortexproject/cortex/pkg/util/services" | ||
| "github.com/cortexproject/cortex/pkg/util/test" | ||
| cotex_testutil "github.com/cortexproject/cortex/pkg/util/testutil" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a typo.
f262ab1    to
    fd54015      
    Compare
  
    | 
           @yeya24  | 
    
fc2f24c    to
    271165b      
    Compare
  
    2be7635    to
    0c582b3      
    Compare
  
            
          
                CHANGELOG.md
              
                Outdated
          
        
      | * [CHANGE] StoreGateway/Alertmanager: Add default 5s connection timeout on client. #6603 | ||
| * [CHANGE] Ingester: Remove EnableNativeHistograms config flag and instead gate keep through new per-tenant limit at ingestion. #6718 | ||
| * [CHANGE] Validate a tenantID when to use a single tenant resolver. #6727 | ||
| * [FEATURE] Alertmanager/Ruler: Introduce a user scanner to reduce the number of list calls to object storage. #6999 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move this changelog to master/unreleased section?
Also I think this is more an enhancement rather than a feature
0c582b3    to
    714d0bd      
    Compare
  
    Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
Signed-off-by: SungJin1212 <[email protected]>
714d0bd    to
    015b423      
    Compare
  
    
This PR introduces a user scanner to reduce the number of list calls to object storage.
Which issue(s) this PR fixes:
Fixes #6914
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]