Skip to content

Releases: sdv-dev/SDMetrics

v0.15.1 - 2024-08-13

13 Aug 21:45
Compare
Choose a tag to compare

Bugs Fixed

  • X-axis for the bar plot should be labeled Value instead of Category - Issue #620 by @R-Palazzo
  • LinAlgError when plotting data that is constant - Issue #616 by @R-Palazzo
  • Wrong chart title when generating a box plot for just the real data using get_column_pair_plot() - Issue #615 by @R-Palazzo

New Features

  • Better error message when passing an SDV Metadata object - Issue #610 by @R-Palazzo
  • Check that every property score are index-free - Issue #583 by @R-Palazzo

v0.15.0 - 2024-07-15

15 Jul 14:22
Compare
Choose a tag to compare

This release adds support for NumPy 2.0! Additionally, the visualization utilities no longer require both real and synthetic data to be provided, and they can now be used to visualize only real or only synthetic data.

Maintenance

  • Switch to using ruff for Python linting and code formatting - Issue #536 by @gsheni
  • Change job names in integration workflow to "integration" - Issue #577 by @rwedge
  • Cap numpy to less than 2.0.0 until SDMetrics supports - Issue #591 by @gsheni

Internal

  • Switch to using ruff for Python linting and code formatting - Issue #536 by @gsheni

New Features

  • Allow me to visualize just the real or synthetic data - Issue #581 by @lajohn4747
  • Update Referential Integrity metric to support NaNs in child column - Issue #587 by @R-Palazzo
  • Add support for numpy 2.0.0 - Issue #593 by @R-Palazzo

Bugs Fixed

  • ColumnPairTrends score depends on the data index - Issue #582 by @R-Palazzo
  • Datetime columns set to Object pandas dtype breaks LSTMDetection - Issue #584 by @fealho

v0.14.1 - 2024-05-13

13 May 19:27
Compare
Choose a tag to compare

This release patches a bug on the LSTMDetection metric.

Bugs Fixed

  • LSTMDetection metric crashes when there are multiple context columns - Issue #298 by @frances-h

Maintenance

  • Cleanup automated PR workflows - Issue #566 by @R-Palazzo
  • Only run unit and integration tests on oldest and latest python versions for macos - Issue #569 by @R-Palazzo

v0.14.0 - 2024-04-11

11 Apr 20:28
Compare
Choose a tag to compare

This release adds support for Python 3.12! It also improves the way the reports print in verbose mode.

Maintenance

New Features

  • Improve readability of the report scores when verbosity is on - Issue #538 by @lajohn4747

v0.13.1 - 2024-03-14

14 Mar 16:35
Compare
Choose a tag to compare

Maintenance

  • Transition from using setup.py to pyroject.toml to specify project metadata - Issue #534 by @lajohn4747
  • Remove bumpversion and use bump-my-version - Issue #535 by @R-Palazzo
  • Add support for Copulas 0.10 - Issue #541 by @amontanez24

v0.13.0 - 2023-12-04

04 Dec 19:06
Compare
Choose a tag to compare

This release makes significant improvements to the Diagnostic Reports! The report now runs a diagnostic to calculate scores for three basic but important properties of your data: data validity, data structure and in the multi table case, relationship validity. Data validity checks that the columns of your data are valid (eg. correct range or values). Data structure makes sure the synthetic data has the correct columns. Relationship validity checks to make sure key references are correct and the cardinality is within ranges seen in the real data. These changes are meant to make the DiagnosticReport a quick way for you to see if there are any major problems with your synthetic data.

Additionally, some general improvements were made and bugs were resolved. The LogisticDetection and SVCDetection metrics were fixed to only use boolean, categorical, datetime and numeric columns in their calculations. A bug that prevented visualizations from displaying on Jupyter notebooks was patched. The cardinality property in the multi table QualityReport can now handle multiple foreign keys to the same parent. Finally, a new visualization was added for sequential/timeseries data called get_column_line_plot.

New Features

  • Detection metrics should only use statistically modeled columns (filter out the rest) - Issue #286 by @lajohn4747
  • Add visualization for timeseries / sequential data - Issue #376 by @lajohn4747
  • Multi table quality report should handle multi-foreign keys (to same parent) - Issue #406 by @R-Palazzo
  • Add KeyUniqueness metric - Issue #460 by @R-Palazzo
  • Add ReferentialIntegrity metric - Issue #461 by @R-Palazzo
  • Add CategoryAdherence metric - Issue #462 by @R-Palazzo
  • Add TableFormat metric - Issue #463 by @R-Palazzo
  • Add CardinalityBoundaryAdherence metric - Issue #464 by @frances-h
  • Add DataValidity property - Issue #467 by @R-Palazzo
  • Add Structure property - Issue #468 by @R-Palazzo
  • Add Relationship Validity property - Issue #469 by @R-Palazzo
  • Update DiagnosticReport to calculate base correctness of synthetic data - Issue #471 by @R-Palazzo
  • Update the synthetic data that's available for the multi-table demo - Issue #501 by @R-Palazzo
  • Update the synthetic data that's available for the single-table demo - Issue #502 by @R-Palazzo
  • Update TableFormat metric to TableStructure + fix its computation - Issue #518 by @R-Palazzo

Bugs Fixed

  • Sometimes graphs don't show when using Jupyter notebook - Issue #322 by @pvk-developer
  • Fix ReferentialIntegrity NaN handling - Issue #494 by @R-Palazzo
  • KeyUniqueness metric should only be applied to primary and alternate keys - Issue #503 by @R-Palazzo
  • Single table Structure property should not have visualization - Issue #504 by @R-Palazzo
  • Multi table Structure property visualization has incorrect styling - Issue #505 by @R-Palazzo
  • UserWarning: KeyError: 'relationships' in DiagnosticReport if metadata missing relationships - Issue #506 by @R-Palazzo
  • Report validate method should be private - Issue #507 by @R-Palazzo
  • ValueError in DiagnosticReport if synthetic data does not match metadata - Issue #508 by @R-Palazzo
  • Check if QualityReport needs the synthetic data to match the metadata - Issue #509 by @R-Palazzo
  • Running single table report on multi table data (or vice versa) results in confusing error - Issue #510 by @R-Palazzo
  • Add metadata validation - Issue #526 by @R-Palazzo

v0.12.1 - 2023-11-01

01 Nov 23:32
Compare
Choose a tag to compare

This release fixes a bug with the new Intertable Trends property and older pandas versions and a bug with how the ML Efficacy metric handled train and test data. Reports handle missing relationships more gracefully.

Bugs Fixed

  • Multiple FutureWarning lines printed out when running the Quality Report (Intertable Trends property) - Issue #490 by @frances-h
  • Transformer should not be fit on test data - Issue #291 by @fealho
  • Reports should not crash if there are no relationships - Issue #481 by @lajohn4747

v0.12.0 - 2023-10-31

01 Nov 02:01
Compare
Choose a tag to compare

This release adds a new property, InterTable Trends. Several plots were moved from the reports module into the new visualizations module. The metadata parameter was removed for these plots, and the plot_types parameter was added. plot_types lets the user control which plot type is used. Several crashes have been resolved.

New Features

Bugs Fixed

  • Fix NewRowSynthesis on datetime columns without formats - Issue #473 by @fealho
  • Intertable trends property crashes if a table has no statistical columns - Issue #476 by @lajohn4747
  • Fix BoundaryAdherence NaN handling - Issue #470 by @frances-h
  • The Intertable Trends visualization is mislabeled as 'Column Shapes' - Issue #477 by @lajohn4747
  • ValueError when using get_cardinality_plot on some schemas - Issue #447 by @frances-h

Internal

v0.11.1 - 2023-09-14

14 Sep 21:33
Compare
Choose a tag to compare

This release makes multiple changes to better handle errors that get raised from the DiagnosticReport. The report should be able to run to completion now and have any errors that it encounters reported in a column on the details that can be observed from running get_details. It also resolves many warnings that were interrupting the printing of the report's results and progress.

New Features

  • Create single table coverage property - Issue #389 by @R-Palazzo
  • Create single table synthesis property - Issue #390 by @R-Palazzo
  • Create single table Boundaries property - Issue #391 by @R-Palazzo
  • Add multi table Coverage, Synthesis and Boundaries property - Issue #393 by @R-Palazzo

Bugs Fixed

  • Ensure that the Synthesis property score doesn't change - Issue #425 by @amontanez24
  • The Error column contains a mix of NaN and None values - Issue #427 by @pvk-developer
  • Always show the Table column in get_details - Issue #429 by @frances-h
  • Diagnostic explanations should not repeat if I generate multiple times - Issue #430 by @amontanez24
  • RangeCoverage errors on datetime columns in DiagnosticReport - Issue #431 by @frances-h
  • The coverage visualization shows empty bar graph for nan values - Issue #432 by @frances-h
  • Diagnostic report should skip over all NaN columns - Issue #433 by @pvk-developer
  • Quality report is printing out a long warning message (hundreds of lines) - Issue #448 by @amontanez24

Internal

  • Use property classes in single table DiagnosticReport - Issue #392 by @R-Palazzo
  • Use property classes in multi table DiagnosticReport - Issue #394 by @R-Palazzo

v0.11.0 - 2023-08-10

10 Aug 21:01
Compare
Choose a tag to compare

This release adds a function that allows users to plot the cardinality of foreign and primary keys in synthetic data. More specifically, it graphs the frequency that each number of children per parent row occurs in the parent table.

Additionally, architectural changes are made to improve the efficiency and error handling of the QualityReport! The progress bar is also enhanced to be more informative when the report is generating.

This release also adds support for Python 3.11 and drops support for Python 3.7.

New Features

  • Visualize cardinality of foreign key columns - Issue #283 by @R-Palazzo
  • Create single table BaseProperty class - Issue #354 by @amontanez24
  • Create single table column shapes property - Issue #355 by @R-Palazzo
  • Create single table column pair trends property - Issue #356 by @R-Palazzo
  • Create multi table BaseProperty class - Issue #357 by @pvk-developer
  • Create multi table column shapes and column pair trends properties - Issue #358 by @R-Palazzo
  • Create Parent Child Relationships property class - Issue #359 by @pvk-developer
  • In Multi Table Quality Report: Rename "Table Relationships" property to "Cardinality" - Issue #360 by @frances-h
  • More accurate progress bar for single table Quality Report - Issue #361 by @R-Palazzo
  • More accurate progress bar for multi table Quality Report - Issue #362 by @fealho
  • Raise error in CorrelationSimilarity if either column is constant - Issue #407 by @fealho

Bug Fixes

  • Issue in building the denormalized table inside the Parent-Child Detection metrics - Issue #328 by @fealho
  • Don't modify the rounding in the quality report - Issue #401 by @R-Palazzo
  • The Cardinality property is missing some relationships - Issue #404 by @pvk-developer
  • The Cardinality property is not returning a DataFrame - Issue #405 by @fealho
  • Overall property score should be the average across all breakdowns - Issue #415 by @amontanez24

Internal

  • Use property classes in single table QualityReport - Issue #370 by @R-Palazzo
  • Use property classes in multi table QualityReport - Issue #371 by @fealho
  • Add add-on detection for premium metrics - Issue #388 by @amontanez24

Maintenance