cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Christine
by Contributor II
  • 7921 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 7921 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor III
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
fix_databricks
by New Contributor II
  • 1889 Views
  • 2 replies
  • 0 kudos

Cannot run another notebook from same directory

Hello, I am having a similar problem from this thread which was never resolved: https://community.databricks.com/t5/data-engineering/unexpected-error-while-calling-notebook-string-matching-regex-w/td-p/18691 I renamed a notebook (utility_data_wrangli...

  • 1889 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

The error message indicates that the notebook name contains special characters, please modify the notebook name to contain only English letters, and remove double quotation marks%run ./utility123 

  • 0 kudos
1 More Replies
SankaraiahNaray
by New Contributor II
  • 27041 Views
  • 10 replies
  • 5 kudos

Not able to read text file from local file path - Spark CSV reader

We are using Spark CSV reader to read the csv file to convert as DataFrame and we are running the job on yarn-client, its working fine in local mode. We are submitting the spark job in edge node. But when we place the file in local file path instead...

  • 27041 Views
  • 10 replies
  • 5 kudos
Latest Reply
AshleeBall
New Contributor II
  • 5 kudos

Thanks for your help. It helped me a lot.

  • 5 kudos
9 More Replies
Karene
by New Contributor
  • 1071 Views
  • 1 replies
  • 0 kudos

Databricks Connection to Redash

Hello,I am trying to connect my Redash account with Databricks so that my organization can run queries on the data in Unity Catalog from Redash.I followed through the steps in the documentation and managed to connect successfully. However, I am only ...

  • 1071 Views
  • 1 replies
  • 0 kudos
Latest Reply
JameDavi_51481
New Contributor III
  • 0 kudos

it looks like the Redash connector for Databricks is hard-coded to run `SHOW DATABASES`, which only shows `hive_metastore` by default. This probably needs to be updated to run `SHOW CATALOGS` and then `SHOW SCHEMAS in <catalog_name>` for each of thos...

  • 0 kudos
ipreston
by New Contributor III
  • 4908 Views
  • 6 replies
  • 0 kudos

Possible false positive warning on DLT pipeline

I have a DLT pipeline script that starts by extracting metadata on the tables it should generate from a delta table. Each record returned from the table should be a dlt table to generate, so I use .collect() to turn each row into a list and then iter...

  • 4908 Views
  • 6 replies
  • 0 kudos
Latest Reply
ipreston
New Contributor III
  • 0 kudos

Thanks for the reply. Based on that response though, it seems like the warning itself is a bug in the DLT implementation. Per the docs "However, you can include these functions outside of table or view function definitions because this code is run on...

  • 0 kudos
5 More Replies
israelst
by New Contributor II
  • 2440 Views
  • 7 replies
  • 5 kudos

DLT can't authenticate with kinesis using instance profile

When running my notebook using personal compute with instance profile I am indeed able to readStream from kinesis. But adding it as a DLT with UC, while specifying the same instance-profile in the DLT pipeline setting - causes a "MissingAuthenticatio...

Data Engineering
Delta Live Tables
Unity Catalog
  • 2440 Views
  • 7 replies
  • 5 kudos
Latest Reply
Mathias_Peters
Contributor
  • 5 kudos

We have used the roleArn and role session name like this: CREATE STREAMING TABLE table_name as SELECT * FROM STREAM read_kinesis ( streamName => 'stream', initialPosition => 'earliest', roleArn => 'arn:aws:iam::ACCT_ID:role/R...

  • 5 kudos
6 More Replies
NataliaCh
by New Contributor
  • 1354 Views
  • 0 replies
  • 0 kudos

Delta table cannot be reached with INTERNAL_ERROR

Hi all!I've been dropping and recreating delta tables at the new location. For one table something went wrong and now I cannot nor DROP nor recreate it. It is visible in catalog, however, when I click on the table I see message: [INTERNAL_ERROR] The ...

  • 1354 Views
  • 0 replies
  • 0 kudos
ashraf1395
by Contributor II
  • 729 Views
  • 1 replies
  • 0 kudos

How to extend free trial period or enter free startup tier to complete our POC for a client.

We are a data consultancy. Our free trial period is currently getting over and we are still doing POC for one of our potential clients and focusing on providing expert services around databricks.1. Is there a possibility that we can extend the free t...

  • 729 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mo
Databricks Employee
  • 0 kudos

hey @ashraf1395, I suggest you contact your databricks representative or account manager.

  • 0 kudos
Mohit_m
by Valued Contributor II
  • 23659 Views
  • 3 replies
  • 4 kudos

Resolved! How to get the Job ID and Run ID and save into a database

We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?

  • 23659 Views
  • 3 replies
  • 4 kudos
Latest Reply
Bruno-Castro
New Contributor II
  • 4 kudos

That article is for members only, can we also specify here how to do it (for those that are not medium members?). Thanks!

  • 4 kudos
2 More Replies
SreeG
by New Contributor II
  • 1249 Views
  • 3 replies
  • 0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

  • 1249 Views
  • 3 replies
  • 0 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 0 kudos

@SreeG thanks for confirming!

  • 0 kudos
2 More Replies
MarkD
by New Contributor II
  • 1383 Views
  • 1 replies
  • 0 kudos

Is it possible to migrate data from one DLT pipeline to another?

Hi,We have a DLT pipeline that has been running for a while with a Hive Metastore target that has stored billions of records. We'd like to move the data to a Unity Catalog. The documentation says "Existing pipelines that use the Hive metastore cannot...

Data Engineering
Delta Live Tables
dlt
Unity Catalog
  • 1383 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 0 kudos

@MarkD good day! I'm sorry, but according to the description, existing pipelines using the Hive metastore cannot be upgraded to use Unity Catalog. To migrate an existing pipeline that writes to Hive metastore, you must create a new pipeline and re-in...

  • 0 kudos
TheDataDexter
by New Contributor III
  • 4029 Views
  • 3 replies
  • 3 kudos

Resolved! Single-Node cluster works but Multi-Node clusters do not read data.

I am currently working with a VNET injected databricks workspace. At the moment I have mounted a the databricks cluster on an ADLS G2 resource. When running notebooks on a single node that read, transform, and write data we do not encounter any probl...

  • 4029 Views
  • 3 replies
  • 3 kudos
Latest Reply
ellafj
New Contributor II
  • 3 kudos

@TheDataDexter Did you find a solution to your problem? I am facing the same issue

  • 3 kudos
2 More Replies
Red_blue_green
by New Contributor III
  • 9154 Views
  • 3 replies
  • 0 kudos

Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark?

Hello,I have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the c...

  • 9154 Views
  • 3 replies
  • 0 kudos
Latest Reply
kanjinghat
New Contributor II
  • 0 kudos

Not sure if you found a solution, you can also try as below. In this case you pass the full path to the delta not the table itself.spark.sql(f"ALTER TABLE delta.`{full_delta_path}` ALTER column {column_name} SET NOT NULL") 

  • 0 kudos
2 More Replies
venkata_kishore
by New Contributor
  • 1549 Views
  • 1 replies
  • 1 kudos

delta live tables - oracle connectivity

Is delta live tables/pipelines support oracle or external database connectivity ? i am getting Oracle Driver not found error. dlt not supporting maven install through asset bundles. ERRORs: 1) py4j.protocol.Py4JJavaError: An error occurred while call...

Data Engineering
Delta Live Tables
dlt
oracle
pipelines
  • 1549 Views
  • 1 replies
  • 1 kudos
Latest Reply
RamGoli
Databricks Employee
  • 1 kudos

Hi @venkata_kishore  , As of now, DLT does not support Oracle, and one cannot install third-party libraries and JARs. https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitationsIf Lakehouse Federation has support for Oracle, then ...

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels