cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826987838
by Contributor
  • 1102 Views
  • 2 replies
  • 0 kudos

Convert pdf's is into structured data

Is there anything on Databricks to help read PDF (payment invoices and receipts for example) and convert it to structured data?

  • 1102 Views
  • 2 replies
  • 0 kudos
Latest Reply
SoniaFoster
New Contributor II
  • 0 kudos

Thanks! Converting PDF format is sometimes a difficult task as not all converters provide accuracy. I want to share with you one interesting tool I recently discovered that can make your work even more efficient. I recently came across an amazing onl...

  • 0 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 5539 Views
  • 6 replies
  • 4 kudos

Resolved! Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

  • 5539 Views
  • 6 replies
  • 4 kudos
Latest Reply
databricks31
New Contributor II
  • 4 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark            ...

  • 4 kudos
5 More Replies
ChrisS
by New Contributor III
  • 1916 Views
  • 7 replies
  • 8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

  • 1916 Views
  • 7 replies
  • 8 kudos
Latest Reply
CharlesReily
New Contributor III
  • 8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...

  • 8 kudos
6 More Replies
dvmentalmadess
by Valued Contributor
  • 2615 Views
  • 9 replies
  • 1 kudos

Resolved! Data Explorer minimum permissions

What are the minimum permissions are required to search and view objects in Data Explorer? For example, does a user have to have `USE [SCHEMA|CATALOG]` to search or browse in the Data Explorer? Or can anyone with workspace access browse objects and, ...

  • 2615 Views
  • 9 replies
  • 1 kudos
Latest Reply
bearded_data
New Contributor II
  • 1 kudos

Hi all -  @LandanG I wanted to bump this thread to see if there was any traction on giving us the ability to expose the table metadata to users (using USE <object> permission) while not allowing the users to SELECT from the tables themselves?  I thin...

  • 1 kudos
8 More Replies
mriccardi
by New Contributor II
  • 1443 Views
  • 4 replies
  • 1 kudos

Spark streaming: Checkpoint not recognising new data

Hello everyone!We are currently facing an issue with a stream that is not updating new data since the 20 of July.We've validated and bronze table has data that silver doesn't have.Also seeing the logs the silver stream is running but writing 0 files....

  • 1443 Views
  • 4 replies
  • 1 kudos
Latest Reply
mriccardi
New Contributor II
  • 1 kudos

Also the trigger is configured to run once, but when we start the job it never ends, it keeps in an endless loop.

  • 1 kudos
3 More Replies
amruth
by New Contributor
  • 1293 Views
  • 4 replies
  • 0 kudos

How do i retrieve timestamp data from history in databricks sql not using DELTA table,its data is coming from SAP

I am not using delta tables my data is from SAP ..how do i retrieve timestamp(history) dynamically from SAP table using databricks SQL

  • 1293 Views
  • 4 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

@amruth If you're working with data from SAP in Databricks and want to retrieve timestamps dynamically from a SAP table, you can utilize Databricks SQL to achieve this. You'll need to identify the specific SAP table that contains the timestamp or his...

  • 0 kudos
3 More Replies
Anonymous
by Not applicable
  • 20464 Views
  • 6 replies
  • 9 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

  • 20464 Views
  • 6 replies
  • 9 kudos
Latest Reply
yliu
New Contributor III
  • 9 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it? 

  • 9 kudos
5 More Replies
Phani1
by Valued Contributor
  • 3050 Views
  • 3 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 3050 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1 ,  • Databricks recommends applying data quality rules on the bronze layer before proceeding to the silver and gold layer.• The recommended approach involves storing data quality rules in a Delta table.• The rules are categorized by a tag ...

  • 0 kudos
2 More Replies
Swostiman
by New Contributor II
  • 3359 Views
  • 5 replies
  • 4 kudos

Consuming data from databricks[Hive metastore] sql endpoint using pyspark

I was trying to read some delta data from databricks[Hive metastore] sql endpoint using pyspark, but while doing so I encountered that all the values of the table after fetching are same as the column name.Even when I try to just show the data it giv...

  • 3359 Views
  • 5 replies
  • 4 kudos
Latest Reply
sucan
New Contributor II
  • 4 kudos

Encountered the same issue and downgrading to 2.6.22 helped me resolve this issue.

  • 4 kudos
4 More Replies
Binesh
by New Contributor II
  • 1287 Views
  • 2 replies
  • 0 kudos

Databricks Logs some error messages while trying to read data using databricks-jdbc dependency

I have tried to read data from Databricks using the following java code.String TOKEN = "token..."; String url = "url...";   Properties properties = new Properties(); properties.setProperty("user", "token"); properties.setProperty("PWD", TOKEN);   Con...

Logger Errors
  • 1287 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@Binesh J​ - The issue could be due to the data type of the column is not compatible with getString() method in line#17. use getObject() method to retrieve the value as a generic value and then convert to string.

  • 0 kudos
1 More Replies
vanessafvg
by New Contributor III
  • 1008 Views
  • 1 replies
  • 3 kudos

Extracting data from excel in datalake storage using openpyxl

i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..from openpyxl import load_workbookdirectory_id="hidden"scope="hidden"client_id="hidden"service_credential_key="hidden"container_name="hidden"s...

  • 1008 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Vanessa Van Gelder​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 3 kudos
Ram443
by New Contributor III
  • 14629 Views
  • 9 replies
  • 5 kudos

Resolved! I created a data frame but was not able to see the data

Code to create a data frame:from pyspark.sql import SparkSessionspark=SparkSession.builder.appName("oracle_queries").master("local[4]")\  .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()from pyspark.sql.functions ...

  • 14629 Views
  • 9 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

@ramanjaneyulu kancharla​  can you please select my answer as best answer

  • 5 kudos
8 More Replies
Sas
by New Contributor II
  • 854 Views
  • 1 replies
  • 0 kudos

A streaming job going into infinite looping

HiBelow i am trying to read data from kafka, determine whether its fraud or not and then i need to write it back to mongodbbelow is my code read_kafka.pyfrom pyspark.sql import SparkSession from pyspark.sql.functions import * from pyspark.sql.types i...

  • 854 Views
  • 1 replies
  • 0 kudos
Latest Reply
swethaNandan
New Contributor III
  • 0 kudos

Hi Saswata,Can you remove the filter and see if it is printing output to console?kafka_df5=kafka_df4.filter(kafka_df4.status=="FRAUD")Thanks and RegardsSwetha Nandajan

  • 0 kudos
ankris
by New Contributor III
  • 1999 Views
  • 2 replies
  • 0 kudos

Could you please guide us on connecting ServiceNow data in databricks

Would like to extract data like ticket info, resolve time, etc., from ServiceNow in databricks.Not finding much information in community and appreciate your guidance on the same.

  • 1999 Views
  • 2 replies
  • 0 kudos
Latest Reply
crannow
New Contributor II
  • 0 kudos

ServiceNow offers API capabilities. You can consume the ServiceNow API within a Databricks notebook to extract data from ServiceNow. Following is a suggested prompt to use with ChatGPT for example python code to connect to ServiceNow's api. PROMPT: ...

  • 0 kudos
1 More Replies
Labels