cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Murthy1
by Contributor II
  • 3213 Views
  • 2 replies
  • 0 kudos

How can we use existing all purpose cluster for a DLT pipeline?

I understand that DLT is a separate job compute but I would like to use an existing all purpose cluster for the DLT pipeline. Is there a way I can achieve this?

  • 3213 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
feed
by New Contributor III
  • 3925 Views
  • 7 replies
  • 3 kudos

TesseractNotFoundError

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information. in databricks

  • 3925 Views
  • 7 replies
  • 3 kudos
Latest Reply
neha_ayodhya
New Contributor II
  • 3 kudos

%sh apt-get install -y tesseract-ocr this command is not working in my new Databricks free trail account, earlier it worked fine in my old Databricks instance. I get below error: E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Per...

  • 3 kudos
6 More Replies
sandeephenkel23
by New Contributor II
  • 1150 Views
  • 2 replies
  • 1 kudos

How to run Powershell file script.ps1 using the databricks notebook

Hello All,Following command on running through databricks notebook is not working Command%sh# Bash code to print 'Hello, PowerShell!'echo 'Hello, PowerShell!'# powershell.exe -ExecutionPolicy Restricted -File /dbfs:/FileStore/Read_Vault_Inventory.ps1...

  • 1150 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
1 More Replies
hv129
by New Contributor
  • 941 Views
  • 2 replies
  • 1 kudos

java.lang.OutOfMemoryError on Data Ingestion and Storage Pipeline

I have around 25GBs of data in my Azure storage. I am performing data ingestion using Autoloader in databricks. Below are the steps I am performing:Setting the enableChangeDataFeed as true.Reading the complete raw data using readStream.Writing as del...

  • 941 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
1 More Replies
AndyKeel
by New Contributor II
  • 464 Views
  • 3 replies
  • 0 kudos

Creating an ADLS storage credential for an AWS Workspace

I'd like to create a storage credential for an Azure Storage Account in an AWS workspace. I then plan to use this storage credential to create an external volume.Is this possible, and if so what are the steps? Thanks for any help!

  • 464 Views
  • 3 replies
  • 0 kudos
Latest Reply
AndyKeel
New Contributor II
  • 0 kudos

Thanks for your help.I'm struggling to create the Storage Credential. I have created a managed identity via an Azure Databricks Access Connector and am making an API call based on what I'm reading in the API docs: Create a storage credential | Storag...

  • 0 kudos
2 More Replies
NirmalaSathiya
by New Contributor
  • 463 Views
  • 2 replies
  • 0 kudos

Not able to use _metadata to retrieve file name while reading xml files

We are trying to retrieve xml file name using _metadata but not working. we are not able to use input _file_name() also as we are using shared cluster.we are reading the xml files using com.datadricks.spark.xml library  

Data Engineering
filename
read
XML
  • 463 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
ElaPG
by New Contributor III
  • 1601 Views
  • 7 replies
  • 1 kudos

Command restrictions

Is there any possibility to restrict usage of specified commands (like mount/unmount or SQL grant) based on group assignment? I do not want everybody to be able to execute these commands.

  • 1601 Views
  • 7 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
6 More Replies
Cas
by New Contributor III
  • 876 Views
  • 3 replies
  • 1 kudos

Asset Bundles: Dynamic job cluster insertion in jobs

Hi!As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in th...

Data Engineering
asset bundles
jobs
  • 876 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
2 More Replies
israelst
by New Contributor II
  • 780 Views
  • 5 replies
  • 1 kudos

structured streaming schema inference

I want to stream data from kinesis using DLT. the Data is in json format. How can I use structured streaming to automatically infer the schema? I know auto-loader has this feature but it doesn't make sense for me to use autoloader since my data is st...

  • 780 Views
  • 5 replies
  • 1 kudos
Latest Reply
israelst
New Contributor II
  • 1 kudos

I wanted to use Databricks for this. I don't want to depend on AWS Glue. Same way I could do it with AutoLoader...

  • 1 kudos
4 More Replies
Esther_Tomi
by New Contributor
  • 545 Views
  • 2 replies
  • 0 kudos

Unable to Install Cluster-Scoped Libraries on Runtime >13.3

Hello team,I'm trying to upgrade our databricks runtime to 13.3 from 9.1, but i've been having issues installing libraries on the compute  from our internal artifactoryHowever, when I tried this on a unity-catalog enabled workspace, it works seamless...

  • 545 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
pshah83
by New Contributor II
  • 806 Views
  • 2 replies
  • 1 kudos

Use output of SHOW PARTITION commands in Sub-Query/CTE/Function

I am using SHOW PARTITIONS <<table_name>> to get all the partitions of a table. I want to use max() on the output of this command to get the latest partition for the table.However, I am not able to use SHOW PARTITIONS <<table_name>> in a CTE/sub-quer...

  • 806 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
1 More Replies
Simha
by New Contributor II
  • 1131 Views
  • 2 replies
  • 1 kudos

How to write only file on to the Blob or ADLS from Databricks?

Hi All,I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.I want only file to be written. Can anyone...

  • 1131 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
1 More Replies
Mr__D
by New Contributor II
  • 2863 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 2863 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
Danielsg94
by New Contributor II
  • 13504 Views
  • 6 replies
  • 2 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 13504 Views
  • 6 replies
  • 2 kudos
Latest Reply
Simha
New Contributor II
  • 2 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 2 kudos
5 More Replies
rt-slowth
by Contributor
  • 520 Views
  • 3 replies
  • 0 kudos

Resolved! Pipelines using dlt modules from the Unity Catalog

[Situation]I am using AWS DMS to store mysql cdc in S3 as a parquet file.I have implemented a streaming pipeline using the DLT module.The target destination is Unity Catalog.[Questions and issues].- Where are the tables and materialized views specifi...

  • 520 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
2 More Replies
Labels
Top Kudoed Authors