cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826992666
by Valued Contributor
  • 1462 Views
  • 1 replies
  • 0 kudos

Resolved! When running a Merge, if records from the table are deleted are the underlying files that contain the records deleted as well?

I know I have the option to delete rows from a Delta table when running a merge. But I'm confused about how that would actually affect the files that contain the deleted records. Are those files deleted, or are they rewritten, or what?

  • 1462 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta implements MERGE by physically rewriting existing files. It is implemented  in two steps.Perform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in t...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1062 Views
  • 1 replies
  • 0 kudos

Resolved! Are Delta tables able to support GDPR compliance?

I know that when deletes are made from a Delta table the underlying files are not actually removed. For compliance reasons I need to able to truly delete the records. How can I know which files need to be removed, and is there a way to remove them ot...

  • 1062 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Here is a document explaining best practices for GDPR and CCPA compliance using Delta Lake. Specifically on cleaning up stale data - you can use the VACUUM function to remove files that are no longer referenced by a Delta table and are older than a s...

  • 0 kudos
User16765131552
by Contributor III
  • 2230 Views
  • 0 replies
  • 0 kudos

Dataframe.write with table containing Always generate columns and auto generate columns is failing(SQL SERVER + sql-spark-connector)

Dataframe write to SQL Server table containing Always autogenerate column fails. I am using Apache Spark Connector for SQL Server and Azure SQL. When autogenerate field are not included in dataframe, I encountered - "No key found " error If auto-gene...

  • 2230 Views
  • 0 replies
  • 0 kudos
jose_gonzalez
by Moderator
  • 2357 Views
  • 1 replies
  • 0 kudos

Resolved! can I use DBconnect to connect any DBR version?

I would like to know if I can connect using to DBconnect to any DBR version or if only the supported version will work?

  • 2357 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Only the following Databricks Runtime versions are supported:Databricks Runtime 8.1 ML, Databricks Runtime 8.1Databricks Runtime 7.3 LTS ML, Databricks Runtime 7.3 LTSDatabricks Runtime 6.4 ML, Databricks Runtime 6.4Databricks Runtime 5.5 LTS ML, Dat...

  • 0 kudos
MoJaMa
by Valued Contributor II
  • 795 Views
  • 1 replies
  • 0 kudos
  • 795 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 0 kudos

Currently there is no concept of "Cluster Owner". https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissionsSo you have to clone the cluster, thus making the person who cloned it the creator of the new cluster. The...

  • 0 kudos
jose_gonzalez
by Moderator
  • 996 Views
  • 1 replies
  • 0 kudos

Resolved! How can I connect my favorite IDE, like Pycharm to Databricks cluster?

I would like to know if there is a way to connect to Databricks cluster using my IDE

  • 996 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Databricks connect allows you to connect your favorite IDE to Databricks clusters. You can find more details on how to set it up and install all the libraries https://docs.databricks.com/dev-tools/databricks-connect.html

  • 0 kudos
aladda
by Honored Contributor II
  • 931 Views
  • 1 replies
  • 0 kudos
  • 931 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

There’s two places to leverage Github for content management and version control in DatabricksRepos for Git integration - Repos are folders whose contents are co-versioned together by syncing them to a remote Git repository. Repos can contain only Da...

  • 0 kudos
User15787040559
by New Contributor III
  • 1070 Views
  • 1 replies
  • 0 kudos

How to translate Apache Pig FOREACH GENERATE statement to Spark?

If you have the following Apache Pig FOREACH GENERATE statement:XBCUD_Y_TMP1 = FOREACH (FILTER XBCUD BY act_ind == 'Y') GENERATE cust_hash_key,CONCAT(brd_abbr_cd,ctry_cd) as brdCtry:chararray,updt_dt_hash_key;the equivalent code in Apache Spark is:XB...

  • 1070 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
New Contributor II
  • 0 kudos

the equivalent code in Apache Spark is:XBCUD_Y_TMP1_DF = (XBCUD_DF .filter(col("act_ind") == "Y") .select(col("cust_hash_key"), concat(col("brd_abbr_cd"),col("ctry_cd")).alias("brdCtry"), col("updt_dt_hash_key")) )

  • 0 kudos
User15787040559
by New Contributor III
  • 1527 Views
  • 1 replies
  • 0 kudos

What timezone is the “timestamp” value in the Databricks Usage log?

What timezone is the “timestamp” value in the Databricks Usage log ?Is it UTC?timestamp2020-12-01T00:59:59.000ZNeed to match this to AWS Cost Explorer timezone for simplicity.It's UTC.Please see timestamp under Audit Log Schema https://docs.databrick...

  • 1527 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15725630784
New Contributor II
  • 0 kudos

UTC

  • 0 kudos
User16765131552
by Contributor III
  • 1713 Views
  • 1 replies
  • 1 kudos

Resolved! Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli.I'm using the following command:databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }'And getting back this error: Error: ...

  • 1713 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16765131552
Contributor III
  • 1 kudos

I found the right answer here.The correct format to run this command on azure is:databricks clusters create --json '{ "cluster_name": "my-cluster", "spark_version": "4.1.x-scala2.11", "node_type_id": "Standard_DS3_v2", "autoscale" : { "min_workers": ...

  • 1 kudos
User16830818524
by New Contributor II
  • 10997 Views
  • 1 replies
  • 0 kudos

Read Delta Table with Pandas

Is it possible to read a Delta table directly into a Pandas Dataframe?

  • 10997 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Honored Contributor II
  • 0 kudos

You'd have convert a delta table to pyarrow and then use to_pandas. See https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html for details# Create a Pandas Dataframe by initially converting the Delta Lak...

  • 0 kudos
User15787040559
by New Contributor III
  • 1293 Views
  • 1 replies
  • 1 kudos

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?

Why do we need the ec2:CreateTags and ec2:DeleteTags permissions?Are they required?Are ec2 tags used internally as well?

  • 1293 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
New Contributor III
  • 1 kudos

Yes, it’s required. It’s how Databrics tracks and tags resources.The tags are used to identify the owner of clusters on the AWS side and Databricks uses the tag information internally as well.

  • 1 kudos
MoJaMa
by Valued Contributor II
  • 715 Views
  • 1 replies
  • 0 kudos
  • 715 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 0 kudos

Yes. We can convert an existing workspace to PrivateLink on E2.So you can have one workspace that's on PL and one that's not.Please contact your Databricks Representative and we can help you make this change.

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels