cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16765131552
by Contributor III
  • 2218 Views
  • 0 replies
  • 0 kudos

Dataframe.write with table containing Always generate columns and auto generate columns is failing(SQL SERVER + sql-spark-connector)

Dataframe write to SQL Server table containing Always autogenerate column fails. I am using Apache Spark Connector for SQL Server and Azure SQL. When autogenerate field are not included in dataframe, I encountered - "No key found " error If auto-gene...

  • 2218 Views
  • 0 replies
  • 0 kudos
jose_gonzalez
by Moderator
  • 1961 Views
  • 1 replies
  • 0 kudos

Resolved! can I use DBconnect to connect any DBR version?

I would like to know if I can connect using to DBconnect to any DBR version or if only the supported version will work?

  • 1961 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Only the following Databricks Runtime versions are supported:Databricks Runtime 8.1 ML, Databricks Runtime 8.1Databricks Runtime 7.3 LTS ML, Databricks Runtime 7.3 LTSDatabricks Runtime 6.4 ML, Databricks Runtime 6.4Databricks Runtime 5.5 LTS ML, Dat...

  • 0 kudos
MoJaMa
by Valued Contributor II
  • 785 Views
  • 1 replies
  • 0 kudos
  • 785 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 0 kudos

Currently there is no concept of "Cluster Owner". https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissionsSo you have to clone the cluster, thus making the person who cloned it the creator of the new cluster. The...

  • 0 kudos
jose_gonzalez
by Moderator
  • 984 Views
  • 1 replies
  • 0 kudos

Resolved! How can I connect my favorite IDE, like Pycharm to Databricks cluster?

I would like to know if there is a way to connect to Databricks cluster using my IDE

  • 984 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Databricks connect allows you to connect your favorite IDE to Databricks clusters. You can find more details on how to set it up and install all the libraries https://docs.databricks.com/dev-tools/databricks-connect.html

  • 0 kudos
aladda
by Honored Contributor II
  • 915 Views
  • 1 replies
  • 0 kudos
  • 915 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

There’s two places to leverage Github for content management and version control in DatabricksRepos for Git integration - Repos are folders whose contents are co-versioned together by syncing them to a remote Git repository. Repos can contain only Da...

  • 0 kudos
User15787040559
by New Contributor III
  • 1060 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FOREACH GENERATE statement to Spark?

If you have the following Apache Pig FOREACH GENERATE statement:XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;the equivalent code in Apache Spark is:XB...

  • 1060 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
New Contributor II
  • 0 kudos

the equivalent code in Apache Spark is:XBCUD_Y_TMP1_DF = (XBCUD_DF .filter(col("act_ind") == "Y") .select(col("cust_hash_key"), concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"), col("updt_dt_hash_key")) )

  • 0 kudos
User15787040559
by New Contributor III
  • 1501 Views
  • 1 replies
  • 0 kudos

What timezone is the “timestamp” value in the Databricks Usage log?

What timezone is the “timestamp” value in the Databricks Usage log ?Is it UTC?timestamp2020-12-01T00:59:59.000ZNeed to match this to AWS Cost Explorer timezone for simplicity.It's UTC.Please see timestamp under Audit Log Schema https://docs.databrick...

  • 1501 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
New Contributor II
  • 0 kudos

UTC

  • 0 kudos
User16765131552
by Contributor III
  • 1692 Views
  • 1 replies
  • 1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

  • 1692 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

  • 1 kudos
User16830818524
by New Contributor II
  • 10288 Views
  • 1 replies
  • 0 kudos

Read Delta Table with Pandas

Is it possible to read a Delta table directly into a Pandas Dataframe?

  • 10288 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

You'd have convert a delta table to pyarrow and then use to_pandas. See https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html for details# Create a Pandas Dataframe by initially converting the Delta Lak...

  • 0 kudos
User15787040559
by New Contributor III
  • 1275 Views
  • 1 replies
  • 1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

  • 1275 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
New Contributor III
  • 1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

  • 1 kudos
MoJaMa
by Valued Contributor II
  • 706 Views
  • 1 replies
  • 0 kudos
  • 706 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 0 kudos

Yes. We can convert an existing workspace to PrivateLink on E2.So you can have one workspace that's on PL and one that's not.Please contact your Databricks Representative and we can help you make this change.

  • 0 kudos
HowardWong
by New Contributor II
  • 464 Views
  • 0 replies
  • 0 kudos

How do you handle Kafka offsets in a DR scenario?

If on one region running a structured streaming job with a checkpoint fails for whatever reason, DR kicks in to run a job in another region. What is the best way for the pick up the offset to continue where the failed job stopped?

  • 464 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 809 Views
  • 1 replies
  • 1 kudos

Does Databricks provide any isolation mechanisms when deployed in my account?

Does Databricks provide any isolation mechanisms when deployed in my account?

  • 809 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 1 kudos

If you're running on AWS: Databricks deploys Spark nodes in an Amazon Virtual Private Cloud (VPC) running in the customer’s own AWS account, giving the customer full control over their data and instances. VPCs enable customers to isolate the network ...

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels