by
Mr__E
• Contributor II
- 1593 Views
- 1 replies
- 3 kudos
We have a couple sources we'd already set up to stream to prod using a 3p system. Is there a way to sync this directly to our dev workspace to build pipelines? eg. directly connecting to a cluster in prod and pull with a job cluster, dump to S3 and u...
- 1593 Views
- 1 replies
- 3 kudos
Latest Reply
DBFS can be used in many ways. Please refer below: Allows you to interact with object storage using directory and file semantics instead of cloud-specific API commands.Allows you to mount cloud object storage locations so that you can map storage cre...
by
ftc
• New Contributor II
- 3417 Views
- 3 replies
- 0 kudos
I'd like to know what is the design pattern for ingesting data via http API request. The pattern needs use multi-hop architecture. Do we need ingest JSON output to cloud storage first (not bronze layer), then use auto loader to process data further? ...
- 3417 Views
- 3 replies
- 0 kudos
Latest Reply
The API -> Cloud Storage -> Delta is more suitable approach.Auto Loader helps not to lose any data (it keeps track of discovered files in the checkpoint location using RocksDB to provide exactly-once ingestion guarantees), enables schema inference ev...
2 More Replies
- 2688 Views
- 1 replies
- 4 kudos
I have a delta table with about 300 billion rows. Now I am performing some operations on a column using UDF and creating another columnMy code is something like thisdef my_udf(data):
return pass
udf_func = udf(my_udf, StringType())
data...
- 2688 Views
- 1 replies
- 4 kudos
Latest Reply
That udf code will run on driver so better not use it for such a big dataset. What you need is vectorized pandas udf https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html
- 4439 Views
- 2 replies
- 0 kudos
I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...
- 4439 Views
- 2 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...
1 More Replies