What are the applicable limits when using airflow to co-ordinate execution of jobs
When using Airflow to co-ordinate execution of jobs, what are the applicable limits?
- 789 Views
- 0 replies
- 0 kudos
When using Airflow to co-ordinate execution of jobs, what are the applicable limits?
Hi Team I am setting up the Kafka cluster on databricks to ingest the data on delta, but it seems like the cluster is running from last 2 hours but still, the stream is not started and I am not seeing any failure also.
This Type of issue happens if you have firewall on cloud account and your ip is not whitelisted, so pleaae whitelist the ip and issue will resolve
Databricks notebook command gets cancelled:Generally when cluster is having init scripts or lib issues while starting cluster. Exact error can be looked into driver logs.
You can tag your cluster and that tags get propagated to the billing management and there you can see it the cost
What is the difference between ADLS mounted ON DataBricks and dbfs does the Mount of ADLS on databricks make gives any performance benefit , is the mounted ADLS still behave as object storage or it become simple storage
DBFS is just an abstraction on cloud storage By default when you create a workspace, you get an instance of DBFS - so-called DBFS Root. Plus you can mount additional storage accounts under the /mnt folder. Data written to mount point paths (/mnt) is...
Can I convert my jdbc Dataframe into JSON ? Because when I tried it, it got an error. I'm using a script as Pandas DataFrame function df.to_json()
The answer varies depending on the cloud provider (as of June 2021) . In GCP, since the architecture is based on GKE , there are additional ip requirements. For more details see
When table content is dumped from the RDBMS (e.g. Oracle), some column values may contain escaped double quotes (\") in the column values, which may cause the values from multiple columns to be concatenated into one value and result in corrupted reco...
No other output is available, not even output from cells that did run successfully. Also, I'm unable to connect to spark ui or view the logs. It makes an attempt to load each of them, but after some time an error message appears saying it's unable ...
most of the time it's out of memory on driver node. check over all the drive log, data node log in Spark UI. And check if u r collecting huge data to drive node, e.g. collect()
Delta is open source but certain features such as OPTIMIZE, ZORDER are only available on managed DBR. So how open sourced is it really?
Some of the feature is exclusively added by datbricks on top of delta not by comunity so comapny has right whether it wants to open source or not
Bucketing is physical partition of the the table but the Zordering is arrangement of records in a file , in most optimal manner
There isn’t a problem purging old data. When using auto loader it’ll take into account new data being added.
What is the best practice for generating jobs in an automated fashion?
There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...
How can I reduce the risk of data exfiltration while using Databricks
Databricks enterprise security and admin features allow customers to deploy Databricks using their own managed VPC/ VNET. This enables them to have greater flexibility and control over the configuration of their deployment architectureFor Azure follo...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group