cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BriGuy
by New Contributor II
  • 2551 Views
  • 2 replies
  • 0 kudos

create a one off job run using databricks SDK.

I'm trying to build the job spec using objects.  When I try to call execute the job I get the following error.I'm somewhat new to python and not sure what I'm doing wrong here.  Is anyone able to help?Traceback (most recent call last): File "y:\My ...

  • 2551 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @BriGuy, Can you try importing this module first? from databricks.sdk.service.jobs import PermissionLevel

  • 0 kudos
1 More Replies
Dnirmania
by Contributor
  • 1527 Views
  • 2 replies
  • 0 kudos

Foreign Catalog refresh

Hi EveryoneI have recently created one Foreign catalog from AWS Redshift in databricks and I could see some tables too but when I ran REFRESH FOREIGN SCHEMA command, it failed with following error. I tried to search about it online but didn't get any...

  • 1527 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dnirmania
Contributor
  • 0 kudos

REFRESH FOREIGN SCHEMA is databricks command to refresh foreign catalog and I don't have visibility about the queries which its runs internally.

  • 0 kudos
1 More Replies
allinux
by New Contributor II
  • 1339 Views
  • 2 replies
  • 0 kudos

When Try Returns Success for Invalid S3 Path in Spark: Is This a Bug?

Try(spark.read.format("parquet").load("s3://abcd/abcd/")) should result in Failure, but when executed in the notebook, it returns Success as shown below. Isn't this a bug?Try[DataFrame] = Success(...)

  • 1339 Views
  • 2 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@allinux The read is a valid way to load data. Why are you expecting a failure? can you please explain? 

  • 0 kudos
1 More Replies
loic
by Contributor
  • 1413 Views
  • 1 replies
  • 1 kudos

Resolved! Several executions of a single notebook lead to java.lang.OutOfMemoryError

Hello,I am facing an issue that I do not understand. I have as simple Scala notebook with a "read function" that reads a json file on an external storage and does few changes to this DataFrame. I do my test on "all purpose" compute, DS3v2 (14gig/4cor...

Screenshot 2025-02-06 at 11.35.05.png Screenshot 2025-02-06 at 12.02.51.png
  • 1413 Views
  • 1 replies
  • 1 kudos
Latest Reply
loic
Contributor
  • 1 kudos

Finally, we understood the issue by ourself.By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):spark.databricks.sess...

  • 1 kudos
FarBo
by New Contributor III
  • 12826 Views
  • 5 replies
  • 5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

  • 12826 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Farzad Bonabi​ :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

  • 5 kudos
4 More Replies
Puent3
by New Contributor II
  • 1400 Views
  • 4 replies
  • 0 kudos

Error: from databricks import lakehouse_monitoring

We are using the following import: "from databricks import lakehouse_monitoring". We are receiving this error:ImportError: cannot import name 'lakehouse_monitoring' from 'databricks.sdk' (/databricks/python/lib/python3.11/site-packages/databricks/sdk...

Puent3_0-1738878577910.png
  • 1400 Views
  • 4 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

I wasnt able to find that module. However there are options under sdk. Refer to Lakehouse monitoring SDK reference  %python import databricks print(dir(databricks.sdk)) 

  • 0 kudos
3 More Replies
Fikrat
by Databricks Partner
  • 1153 Views
  • 1 replies
  • 1 kudos

Lakeflow access

Hi,Can someone please advise how to sign up for Lakeflow access? I believe it's on public preview now, but it's not listed in my workspace's preview features list.Thanks!

  • 1153 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @Fikrat, LakeFlow is currently in a gated Public Preview. To participate in the preview, you need to contact your Databricks account team. It is not listed in the workspace's preview features list because it requires specific access permissions th...

  • 1 kudos
vidya_kothavale
by Contributor
  • 1595 Views
  • 2 replies
  • 0 kudos

How to Get the Size of Filtered Rows in Databricks SQL

I have a query that filters rows from a table based on a timestamp range. The query is as follows:SELECT COUNT(*) FROM table_name WHERE ts >= '2025-02-04 00:00:00' AND ts < '2025-02-05 00:00:00';This query returns 10 rows. I need to calculate the tot...

  • 1595 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@vidya_kothavale try this code block. Keep in mind to handle the null values.SELECT SUM(OCTET_LENGTH(CAST(column1 AS STRING)) + OCTET_LENGTH(CAST(column2 AS STRING)) + OCTET_LENGTH(CAST(COALESCE(column3, '0') AS STRING))) as bytes, SUM(OCTET...

  • 0 kudos
1 More Replies
ronaldgeneblazo
by New Contributor II
  • 1535 Views
  • 2 replies
  • 1 kudos

Urgent: Iceberg REST catalog - load a table has new JSON format

Hello, we are using Databricks Unity catalog to load an Iceberg table, i.e., Delta Lake table with Uniform feature). We are using this guide: https://docs.databricks.com/en/external-access/iceberg.html. This has been working for us since last year bu...

  • 1535 Views
  • 2 replies
  • 1 kudos
Latest Reply
ronaldgeneblazo
New Contributor II
  • 1 kudos

Satyadeepak - it looks like this has been fixed on your end and we are no longer seeing this issue. Thanks for checking.

  • 1 kudos
1 More Replies
Juju
by New Contributor II
  • 16635 Views
  • 5 replies
  • 1 kudos

DeltaFileNotFoundException: No file found in the directory (sudden task failure)

Hi all,I am currently running a job that will upsert a table by reading from delta change data feed from my silver table. Here is the relevent snippet of code:  rds_changes = spark.read.format("delta") \ .option("readChangeFeed", "true") \ .optio...

  • 16635 Views
  • 5 replies
  • 1 kudos
Latest Reply
c-data
New Contributor II
  • 1 kudos

What was the fix?

  • 1 kudos
4 More Replies
deng_dev
by New Contributor III
  • 1111 Views
  • 1 replies
  • 0 kudos

Autoloader: Cross-account bucket Assume role access denied

 Hi everyone!I have a Databricks instance profile role that has permission to assume a role in another AWS account to access an S3 bucket in that account.When I try to assume the role using boto3, it correctly reads the Databricks AWS credentials, as...

  • 1111 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @deng_dev , Greetings!In the above error message, you will see a request ID in it, so can you please share that Request ID with the AWS Team to check why this request is getting denied as this looks like a permission issue.  Please let me know if ...

  • 0 kudos
carlos_tasayco
by Contributor
  • 1521 Views
  • 3 replies
  • 0 kudos

how to pull a parameter from .sql file with dbutils.notebook.run

Hi,I want to use this:result = dbutils.notebook.run('/Workspace/Usersxxxxt', 600, {"environment": inputEnvironment}) this pulls from this .sql file in that path:DROP TEMPORARY VARIABLE IF EXISTS strEnv;DECLARE VARIABLE strEnv STRING;SET VARIABLE strE...

carlos_tasayco_2-1738853738229.png
  • 1521 Views
  • 3 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@carlos_tasayco There are two methods on how you can pass a variable to the other notebooks as input.Using Widgetsusing collect method like below. # In notebook1 result = spark.sql("SELECT value FROM table").collect()[0][0] dbutils.notebook.exit(resu...

  • 0 kudos
2 More Replies
Maatari
by New Contributor III
  • 5275 Views
  • 4 replies
  • 0 kudos

Resolved! What is the behaviour of starting version with spark structured streaming ?

Looking into the followinghttps://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-positionI am unclear as to what is the exact difference (if any) between "startingVersion: The Delta Lake version to start from. Databricks ...

  • 5275 Views
  • 4 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @dlorenzo, interesting take! I don’t agree with your statement, though. According to both the documentation and my own testing, startingVersion = "latest" explicitly skips all historical data and starts from the latest committed version at the tim...

  • 0 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 3414 Views
  • 10 replies
  • 0 kudos

Allows to serveless compute to connect to postgres db

Hi Community,Is it possible to enable VNet peering between Databricks Serverless Compute and a private PostgreSQL database that is already configured with a VNet?Currently, everything works fine when I create my personal cluster because I have set up...

  • 3414 Views
  • 10 replies
  • 0 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 0 kudos

Is that PostgreSQL server going to go away after you migrate to Databricks, or is it going to continue to be used?  Either way, federation works for you.  If you're going to discontinue it, just do a full extract into an archive location and a one-ti...

  • 0 kudos
9 More Replies
ClaudeR
by New Contributor III
  • 5929 Views
  • 3 replies
  • 2 kudos

Resolved! [Simba][SparkJDBCDriver](500177) Error getting http path from connection string

I'm trying to use a very basic java program to connect to Databricks using spark jdbc driver (SparkJDBC42.jar), but I get the error (mentioned above): [Simba][SparkJDBCDriver](500177) Error getting http path from connection stringHere is my code snip...

  • 5929 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hello @Claude Repono​ Thank you for posting your question in the community. It seems you were able to find the solution by yourself. That's awesome. We are going to go ahead and mark your answer as the best solution.

  • 2 kudos
2 More Replies
Labels