cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Honored Contributor III
  • 618 Views
  • 1 replies
  • 0 kudos

Stream is not getting started from kafka after 2 hours of cluster statrt

Hi Team I am setting up the Kafka cluster on databricks to ingest the data on delta, but it seems like the cluster is running from last 2 hours but still, the stream is not started and I am not seeing any failure also.

  • 618 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

This Type of issue happens if you have firewall on cloud account and your ip is not whitelisted, so pleaae whitelist the ip and issue will resolve

  • 0 kudos
User16783853032
by New Contributor II
  • 848 Views
  • 1 replies
  • 0 kudos

Databricks notebook command gets cancelled:Generally when cluster is having init scripts or lib issues while starting cluster. Exact error can be look...

Databricks notebook command gets cancelled:Generally when cluster is having init scripts or lib issues while starting cluster. Exact error can be looked into driver logs.

Screen Shot 2021-06-07 at 2.42.14 PM Screen Shot 2021-06-07 at 2.45.22 PM
  • 848 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Awsome Knowledge

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1053 Views
  • 1 replies
  • 0 kudos

Azure Databricks with Storage Account as data layer and DBFS understanding

What is the difference between ADLS mounted ON DataBricks and dbfs does the Mount of ADLS on databricks make gives any performance benefit , is the mounted ADLS still behave as object storage or it become simple storage

  • 1053 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

DBFS is just an abstraction on cloud storage By default when you create a workspace, you get an instance of DBFS - so-called DBFS Root. Plus you can mount additional storage accounts under the /mnt folder. Data written to mount point paths (/mnt) is...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 4039 Views
  • 1 replies
  • 0 kudos

How to conver Dataframe into JSON on Databricks?

Can I convert my jdbc Dataframe into JSON ? Because when I tried it, it got an error. I'm using a script as Pandas DataFrame function df.to_json()

  • 4039 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

df.toJSON()

  • 0 kudos
User16783855534
by New Contributor III
  • 3066 Views
  • 3 replies
  • 1 kudos
  • 3066 Views
  • 3 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

The answer varies depending on the cloud provider (as of June 2021) . In GCP, since the architecture is based on GKE , there are additional ip requirements. For more details see

  • 1 kudos
2 More Replies
Anonymous
by Not applicable
  • 651 Views
  • 0 replies
  • 0 kudos

Escaped quotes mess up table records

When table content is dumped from the RDBMS (e.g. Oracle), some column values may contain escaped double quotes (\") in the column values, which may cause the values from multiple columns to be concatenated into one value and result in corrupted reco...

  • 651 Views
  • 0 replies
  • 0 kudos
JustinMills
by New Contributor III
  • 29718 Views
  • 6 replies
  • 0 kudos

Resolved! Job fails with "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached."

No other output is available, not even output from cells that did run successfully. Also, I'm unable to connect to spark ui or view the logs. It makes an attempt to load each of them, but after some time an error message appears saying it's unable ...

  • 29718 Views
  • 6 replies
  • 0 kudos
Latest Reply
lzlkni
New Contributor II
  • 0 kudos

most of the time it's out of memory on driver node. check over all the drive log, data node log in Spark UI. And check if u r collecting huge data to drive node, e.g. collect()

  • 0 kudos
5 More Replies
Anonymous
by Not applicable
  • 599 Views
  • 1 replies
  • 0 kudos

Delta - open source?

Delta is open source but certain features such as OPTIMIZE, ZORDER are only available on managed DBR. So how open sourced is it really?

  • 599 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Some of the feature is exclusively added by datbricks on top of delta not by comunity so comapny has right whether it wants to open source or not

  • 0 kudos
User16789201666
by Contributor II
  • 836 Views
  • 1 replies
  • 0 kudos
  • 836 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Contributor II
  • 0 kudos

There isn’t a problem purging old data. When using auto loader it’ll take into account new data being added.

  • 0 kudos
User16789201666
by Contributor II
  • 891 Views
  • 1 replies
  • 2 kudos

What is the best practice for generating jobs in an automated fashion?

What is the best practice for generating jobs in an automated fashion?

  • 891 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16789201666
Contributor II
  • 2 kudos

There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...

  • 2 kudos
sajith_appukutt
by Honored Contributor II
  • 406 Views
  • 1 replies
  • 0 kudos

How can I reduce the risk of data exfiltration while using Databricks

How can I reduce the risk of data exfiltration while using Databricks

  • 406 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Databricks enterprise security and admin features allow customers to deploy Databricks using their own managed VPC/ VNET. This enables them to have greater flexibility and control over the configuration of their deployment architectureFor Azure follo...

  • 0 kudos
Anonymous
by Not applicable
  • 610 Views
  • 0 replies
  • 0 kudos

Newline characters mess up the table records

When creating tables from text files containing newline characters in the middle of the lines, the table records will null column values because the newline characters in the middle of the lines break the lines into two different records and fill up ...

  • 610 Views
  • 0 replies
  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors