Sampling
Sampling on the notebook
Add the following to your setup.py
# given that ddataflow config usually sits on the root of the project
# we add it to the package data manually if we want to access the config
# installed as a library
py_modules=[
"ddataflow_config",
],
With DBrocket
Cell 1
%pip install --upgrade pip
%pip install ddataflow
%pip install /dbfs/temp/user/search_ranking_pipeline-1.0.1-py3-none-any.whl --force-reinstall`
Cell 2
from ddataflow_config import ddataflow
ddataflow.save_sampled_data_sources()
Then use dry_run=False when you are ready to copy.