Number of worker threads to use for profiling. Supported only in snowflake-beta and BigQuery If set to null, no limit on the row count of tables to profile. Profile tables only if their row count is less then specified count. If set to null, no limit on the size of tables to profile. Profile tables only if their size is less then specified GBs. Supported only in snowflake, snowflake-beta and BigQuery. If set to null, no constraint of last modified time for tables to profile. Profile table only if it has been updated since these many number of days. The cost of profiling goes up significantly as the number of columns to profile goes up. Profiling.max_number_of_fields_to_profileĪ positive integer that specifies the maximum number of columns to profile for any table. Whether to profile for the sample values for all columns. Whether to profile for the histogram for numeric fields. Whether to profile for distinct value frequencies. Profiling.include_field_distinct_value_frequencies Whether to profile for the quantiles of numeric columns. Whether to profile for the standard deviation of numeric columns. Whether to profile for the median value of numeric columns. Whether to profile for the mean value of numeric columns. Whether to profile for the max value of numeric columns. Whether to profile for the min value of numeric columns. Whether to profile for the number of nulls for each column. Whether to perform profiling at table-level only, or include column-level profiling as well. This also limits maximum number of fields being profiled to 10. This turns off profiling for quantiles, distinct_value_frequencies, histogram & sample_values. Whether to turn off expensive profiling or not. Profiling.turn_off_expensive_profiling_metrics Whether to report datasets or dataset columns which were not profiled. to match all tables in schema analytics, use the regex 'analytics' Specify regex to only match the schema name. Regex patterns for schemas to filter in ingestion. Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Soft-deletes the entities of type in the last successful run but missing in the current run with stateful_ingestion enabled. If set to True, ignores the current checkpoint state. If set to True, ignores the previous checkpoint state. Otherwise, the default DatahubClientConfig. Default: The datahub_api config if set at pipeline level. The configuration required for initializing the state provider. ❓ (required if stateful_ingestion.state_provider is set) The ingestion state provider configuration. Default is 16MBĭynamicTypedStateProviderConfig (see below for fields) The maximum size of the checkpoint state in bytes. The type of the ingestion state provider registered with datahub. SQLAlchemyStatefulIngestionConfig (see below for fields) Takes precedence over other connection parameters. The instance of the platform that all assets produced by this recipe belong toĪlias to apply to database when ingesting. The platform that this source connects to The environment that all assets produced by this connector belong to
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |